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1.  Introduction 


Warfighters  working  with  robots  are  at  the  cutting  edge  of  the  Future  Combat  Systems  (FCS) 
fighting  forces.  These  individuals  work  with  a  diverse  set  of  land,  air,  sea,  and  undersea  vehicles 
capable  of  a  variety  of  missions.  The  missions  vary  and  can  include  unattended  sensors, 
reconnaissance,  search  and  rescue,  medical  support,  and  direct  contact  with  enemy  assets,  with 
the  systems  ranging  from  single  sensors  to  multirobot  systems.  Examples  include  FCS 
technologies  network,  TALON,  iRobot,  PackBot,  the  SPARTAN  Advanced  Concept 
Technology  Demonstration,  and  the  Family  of  Integrated  Rapid  Response  Equipment  sensors 
and  vehicles  (Powell  et  al.,  2006).  Just  as  the  missions  and  systems  vary  greatly,  so  do  the 
operator  control  units  and  multioperator  control  unit  interfaces  employed  to  operate  the  robots. 
This  variety  of  missions,  robot  types,  and  interfaces  can  be  difficult  to  train  for  and  manage.  It  is 
therefore  essential  to  identify  the  cognitive  and  task  demands  being  placed  on  the  warfighter  to 
ensure  successful  mission  outcomes. 

Several  different  approaches  are  necessary  to  cover  the  criterion  space  of  these  cognitive  and 
task  demands.  The  main  strategy  utilized  here  is  an  evaluation  of  the  existing  literature  on 
human-robot  interaction  (HRI).  Existing  documents  from  the  academic  and  the  U.S.  Army 
Research  Laboratory  literatures  were  examined  and  coded.  The  major  dimensions  of 
classifications  uncovered  included  the  number  of  platforms  controlled,  task  difficulty 
comparisons,  level  of  control  by  platforms,  cuing/decision-making  reliability,  stereoscopic  (SS) 
vs.  monoscopic  (MS)  display,  comparisons  between  modalities,  comparisons  within  modalities, 
frame  rate  (FR),  field  of  vision  (FOV),  latency/time  delay,  and  camera  perspective.  A  summary 
of  these  documents  is  available  upon  request. 

This  report  contains  several  sections  that  support  the  taxonomy  and  provide  recommendations 
for  future  multimodal  displays  and  research.  Sections  2-4  were  originally  three  separate  papers, 
each  elaborating  on  specific  aspects  of  the  taxonomy.  Each  section  covers  a  particular  topic  in 
HRI.  Section  5  presents  proposals  for  follow-on  HRI  research. 

Due  to  size  constraints,  a  separate,  in-depth  analysis  of  HRI  cognitive  task  dimensions  is  not 
presented  here  but  is  available  upon  request  from  the  authors.  The  in-depth  analysis  exists  in 
two  parts.  The  first  portion  is  in  this  report  and  the  second  exists  online.  A  database  was  created 
in  RefWorks  (2009)  of  articles  eligible  for  meta-analysis.  The  coding  sheet  for  the  articles  and 
instructions  for  using  this  database  are  also  available  from  the  authors.  The  database  itself  exists 
online  and  is  available  via  the  Web  at  http://www.refworks.com/. 

Especially  notable  are  any  guiding  principles  culled  from  each  article.  Section  6  concludes  with 
a  references  list  of  the  articles  in  the  meta-analysis  folder  of  the  REFWORKS  database.  These 
studies  have  been  screened  and  coded  as  being  eligible  for  meta-analysis. 
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2.  Workload  in  Human-Robot  Interaction:  A  Review  of  Manipulations  and 
Outcomes 


The  current  study  reviews  the  relationship  between  manipulations  of  teleoperator  workload  and 
task  outcomes,  using  multiple  resource  theory  as  the  underlying  framework.  Results  indicated 
that  controlling  more  than  two  platforms  is  detrimental  to  many  perfonnance  indices  (reaction 
time  [RT],  error  rate  [ER]),  but  overall  productivity  improves.  For  studies  that  manipulated 
workload  for  a  single  robot  task,  visual  demands  were  a  limiting  factor,  and  interventions  that 
reduced  visual  demands  improved  performance.  We  conclude  with  guiding  principles  for 
managing  workload  and  improving  teleoperator  performance. 

2.1  Introduction 

Autonomous  agents  have  become  an  essential  tool  for  a  myriad  of  tasks.  Through  the  use  of 
unmanned  aerial  vehicles  (UAVs)  and  unmanned  ground  vehicles  (UGVs),  service  personnel  can 
carry  out  tasks  with  a  reduced  risk  to  their  safety.  In  recognition  of  these  aforementioned 
advantages,  there  has  been  an  increased  interest  in  understanding  and  improving  HRI  (Chen  et 
al.,  2007).  From  a  human  factors  perspective,  understanding  and  mitigating  the  impact  of 
workload  should  improve  performance  in  HRI.  This  section  addresses  the  issue  of  workload  in 
HRI  through  a  review  of  the  experimental  literature.  Existing  research  has  examined  a  multitude 
of  manipulations  and  outcomes  of  workload  demands,  but  a  synthesis  is  needed  to  understand  the 
state  of  the  current  research.  The  current  review  provides  this  need  by  integrating  HRI  studies 
according  to  manipulations,  tasks,  and  outcomes  in  order  to  draw  guiding  principles. 

2.1.1  Workload  Manipulations  in  HRI 

This  section  utilizes  multiple  resource  theory  (MRT)  as  the  framework  for  workload  in  HRI,  as 
described  by  Wickens  (2002).  The  main  tenets  of  MRT  suggest  that  multiple  cognitive  resources 
allow  for  multitasking  or  time-sharing  performance.  Specifically,  tasks  requiring  different 
cognitive  resources  can  often  be  effectively  performed  together,  but  competition  for  the  same 
resource(s)  can  produce  interference.  Much  of  the  recent  work  on  MRT  has  defined  these 
resource  channels  while  predicting  the  degree  to  which  information  from  strained  resource 
channels  can  be  effectively  offloaded  to  less-used  channels.  To  summarize,  tasks  may  strain 
cognitive  resources  through  verbal,  manual,  or  sensory  demands  (for  a  complete  review,  see 
Wickens  [2002]). 

Controlling  a  platform  or  interacting  with  an  artificial  agent  imposes  many  demands,  such  as 
executing  menu  functions,  navigating  to  waypoints,  manipulating  a  foreign  object,  processing 
information  from  data  uplinks,  and  communicating  with  team  members.  Most  manipulations  of 
HRI  workload  stem  from  changing  the  number  of  robots  available  or  manipulating  the  demands 
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of  a  single  task  or  resource.  Multirobot  control  affects  workload  by  increasing  the  number  of 
sub  tasks  (monitoring,  navigating,  and  executing).  Although  providing  a  user  with  more  than  one 
platform  to  control  will  certainly  increase  workload,  will  this  additional  strain  outweigh  the 
benefit  of  having  multiple  robots  to  execute  task  actions?  Addressing  this  question  may  depend 
upon  the  tasks  being  performed  and  the  criteria  desired.  Thus,  we  examine  the  issue  of 
multirobot  control  by  reviewing  the  HRI  literature  according  to  the  tasks  and  criteria  studied. 

In  contrast  to  manipulations  of  robot  quantity,  other  manipulations  of  workload  focus  on  a  single 
task  or  cognitive  resource.  These  interventions  frequently  include  changing  the  perfonnance 
standard  (e.g.,  number  of  targets  to  process)  or  changing  the  environmental  complexity  (e.g., 
terrain  detail).  Whereas  environmental  complexity  should  impact  primarily  sensory  (visual) 
demands,  performance  standards  are  more  likely  to  affect  responding  demands.  A  review  of 
these  manipulations  should  reveal  the  practical  limitations  of  various  cognitive  resource  channels 
for  HRI  tasks. 

2.1.2  Purpose 

Now  that  MRT  and  the  common  workload  manipulations  in  HRI  have  been  outlined,  the  purpose 
of  this  section  is  to  draw  guiding  principles  for  teleoperator*  workload  and  perfonnance.  A 
qualitative  review  will  allow  us  to  compare  the  effects  of  distinct  workload  manipulations  across 
a  variety  of  tasks  and  study  criteria.  To  analyze  the  literature,  a  systematic  coding  process  was 
applied  to  the  extant  database,  described  next. 

2.2  Method 

2.2.1  Literature  Search 

The  literature  search  included  a  query  using  several  scientific  and  military  electronic  databases, 
including  the  Defense  Technical  Information  Center  (DTIC),  the  Association  for  Computing 
Machinery  (ACM),  and  the  Institute  of  Electrical  and  Electronics  Engineers  (IEEE).  References 
from  a  recent  HRI  review  (Chen  et  ah,  2007),  as  well  as  obtained  experimental  studies,  were  also 
checked  for  eligibility.  Finally,  a  hand  search  was  conducted  on  the  following  journals  and 
proceedings  for  the  past  5  years:  Human  Factors,  Presence,  Human  Computer  Interaction 
(HCI),  and  IEEE. 

2.2.2  Coding  Procedure  and  Inclusion  Criteria 

Before  coding,  raters  reviewed  the  variables  of  interest,  constructed  a  coding  sheet  to  reflect 
them,  and  accordingly  screened  articles  for  eligibility.  Five  studies  were  then  selected  and  coded 
by  all  raters  to  examine  validity  and  agreement.  Based  on  acceptable  agreement,  one  out  of  five 
raters  coded  the  studies  for  this  review  based  upon  the  definitions  described  in  the  following 
paragraph. 


The  word  “teleoperator”  is  broadly  defined  here  and  refers  to  an  individual  operating  a  device  from  a  remote  location. 
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To  be  included  in  the  present  review,  an  article  was  required  to  report  a  study  that  experimentally 
compared  operator  performance  between  different  workload  conditions.  Furthennore,  tasks  had 
to  utilize  artificial  agents  or  involve  teleoperation.  Thus,  studies  that  used  equipment  for 
non-HRI  tasks  (e.g.,  cockpit  simulators)  were  excluded  from  this  review.  Criteria  included 
measures  of  (1)  production  (e.g.,  number  of  actions),  (2)  errors  (e.g.,  incorrect  actions),  (3)  RT, 
(4)  efficiency  (e.g.,  time  to  task  completion),  (5)  perceived  workload  (e.g.,  the  National 
Aeronautics  and  Space  Administration  Task  Load  Index  [NASA-TLX]  scores),  and  (6) 
situational  awareness  (SA).  Finally,  study  characteristics  such  as  the  design  (e.g.,  repeated 
measures),  sample  (e.g.,  student),  task,  and  apparatus  (e.g.,  UAV)  were  noted  during  coding. 

2.3  Results 

Table  1  lists  the  citations  for  the  18  studies  assessing  multirobot  control,  the  number  and  type  of 
platform  used,  the  measured  task  outcomes,  and  key  findings.  In  general,  samples  ranged  from 
students  to  aviation  and  HRI  professionals.  Tasks  predominantly  included  navigating  platforms 
to  targets  or  areas  of  interest,  executing  an  action  (e.g.,  inspection,  manipulation),  and 
monitoring  and  responding  to  system  gauges  and  alerts. 

When  examining  results  by  the  task  perfonnance  measures,  we  observe  an  emerging  trade-off 
between  production  and  other  measures.  In  many  studies,  teleoperators  could  execute  more  total 
actions  as  they  controlled  more  platforms  (e.g.,  Crandall  and  Cummings,  2007;  Lif  et  ah,  2007; 
Squire  et  ah,  2006).  However,  increasing  the  number  of  platforms  also  increased  ERs  in 
targeting  and  navigation  (e.g.,  Dixon  and  Wickens,  2003;  Galster  et  al.,  2006),  and  it  tended  to 
increase  RTs  (e.g.,  Chadwick,  2006;  Levinthal  and  Wickens,  2006).  These  results  suggest  that 
the  control  of  multiple  platfonns  allows  the  teleoperator  to  accomplish  more  tasks  overall 
because  of  the  increased  resources.  However,  this  added  productivity  comes  at  a  cost  of 
accuracy  and  efficiency.  Although  the  control  of  one  robot  was  optimal  for  task  errors  and  RT 
across  studies,  the  control  of  two  robots  did  not  inhibit  performance  to  nearly  the  same  degree  as 
control  of  four  or  more  robots  (Adams,  2009;  Chadwick,  2006;  Ruff  et  al.,  2002).  Thus,  control 
of  two  platforms  might  provide  an  optimal  fit  for  maximizing  both  speeded  perfonnances  and 
ER. 

Finally,  automation  and  multimodal  feedback  were  examined  as  methods  of  improving  the 
cognitive  workload  from  additional  platforms.  In  the  case  of  automation,  reliability  made  a 
much  greater  impact  than  the  degree  or  type  of  automation  (Levinthal  and  Wickens,  2006;  Ruff 
et  al.,  2004).  The  addition  of  audio  feedback,  on  the  other  hand,  provided  a  consistently  more 
positive  effect  (Wickens  et  al.,  2003;  Dixon  and  Wickens,  2003). 

Table  2  presents  the  manipulation  and  the  task  affected  as  well  as  key  findings  for  the  17  studies 
examining  task  demands.  The  types  of  devices  used  had  more  variability  in  this  sample  than  in 
multirobot  samples,  including  a  robotic  ann  interface  (Park  and  Woldstad,  2000),  a  decision¬ 
making  simulation  (Hendy  et  al.,  1997),  and  virtual  environments  (VEs)  from  a  variety  of 
perspectives. 
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Table  1.  Study  summaries  on  multirobot  control. 


Study 

Manipulation 

Criteria  (by  Task  Type) 

Key  Findings 

Adams,  2009 

One,  two,  or  four 
UGVs 

No.  of  actions,  efficiency,  and 
workload  for  search  and  transfer 

•  Slight  differences  between  one  and  two  UGVs,  but  efficiency 
and  perceived  workload  were  worse  with  four  robots. 

Chadwick, 

2005 

One  or  two  UGVs 

Targeting  errors,  navigation 
errors,  and  perceived  workload 

•  No  significant  differences  between  groups. 

Chadwick, 

2006 

One,  two,  or  four 
UGVs 

RT  to  hit  target,  RT  to  correct 
navigational  error 

•  Response  times  degraded  slightly  from  one  to  two  UGVs. 

•  Response  times  degraded  markedly  from  two  to  four  UGVs. 

Chen  et  al., 
2008 

One  or  three  UGV 
and/or  UAVs 

Errors,  efficiency,  SA,  and 
workload  in  targeting  (with 
navigation) 

•  Targeting  errors  were  equal  between  three  platforms  and 
single  UAV  or  UGV,  but  perceived  workload  and  efficiency 
suffered. 

Crandall  and 
Cummings, 
2007 

Two,  four,  six,  or 
eight  UGVs 

Errors  and  efficiency  in 
navigation  and  target 
detection/transfer 

•  Four  and  two  UGV  conditions  exhibited  fewest  lost  robots. 

•  Six  and  eight  UGV  conditions  yielded  highest  no.  of  target 
successes. 

Dixon  and 
Wickens, 
2003 

One  or  two  UAVs 

Tracking  error,  target  reporting 
accuracy,  RT  to  system  alerts 

•  One  UAV  user  had  slightly  better  performance  indices  than 
two  UAVs. 

•  Adding  auditory  feedback  improved  performance  across 
conditions. 

Galster  et  al., 
2006 

Four,  six,  or  eight 
UAVs 

Targeting  accuracy,  time 
processing  key  targets,  RT  to 
probes,  workload 

•  Four  UAV  users  had  better  accuracy  and  RT,  but  equal 
processing  times. 

•  Workload  differences  between  conditions  emerged  for 
difficulty. 

Humphrey  et 
al.,  2007 

Six  or  nine  UGVs 

Efficiency,  workload,  and  SA  in 
bomb  disabling  simulation 

•  No.  of  platforms  also  coincided  with  no.  of  bombs  to  diffuse 
(difficulty). 

•  Performance  and  workload  indices  were  similar  between 
conditions. 

Levinthal 

and 

Wickens, 

2006 

Two  or  four  UAVs 

Idle  time  during  UAV 
navigation,  RT  to  system  alerts 

•  Users  were  less  efficient  when  controlling  four  UAVs. 

•  False  alanns  in  automation  hurt  performance  more  than  false 
misses. 

Lif  et  al., 
2007 

One,  two,  or  three 
UGVs 

Number  of  waypoints  reached 
within  given  time  (production) 

•  Users  visited  more  waypoints  controlling  two  or  three  UGVs 
(equally)  than  controlling  one. 

Murray, 

1995 

One,  two,  or  three 
sensors 

Time  to  monitoring  task 
completion 

•  Users  were  significantly  slower  completing  the  tracking  task 
with  three  platforms  than  with  one. 

Parasuraman 
et  al.,  2005 

Four  or  eight  UGVs 

Completion  time  for  game,  no. 
of  games  won,  workload 

•  Completion  time  and  win  rate  deteriorated  from  four  to  eight 
UGVs. 

•  As  workload  increased,  automation  features  had  a  greater 
impact. 

Ruff  et  al., 
2002 

One,  two,  or  four 
UAVs 

Targeting  accuracy,  correct 
rejection  rate  of  automation 
errors,  workload 

•  One  UAV  user  had  the  fewest  rejection  errors,  two  UAV 
users  had  the  best  targeting  accuracy,  and  four  UAV  users 
reported  the  most  workload. 

Ruff  et  al., 
2004 

Two  or  four  UAV  s 

Targeting  and  navigation 
completion,  RT  to  system  alerts, 
workload 

•  All  performance  indices  were  better  in  two  UAV  conditions 
than  four. 

•  Reliability  of  automation,  rather  than  level  of  automation, 
had  greatest  impact. 

Squire  et  al., 
2006 

Four,  six,  or  eight 
UAVs 

Total  number  of  actions 
executed  (production) 

•  Users  performed  increasingly  more  actions  with  more 
platforms. 

Trouvain  and 
Wolf,  2003 

Two,  four,  or  eight 
UGVs 

No.  of  inspections,  no.  of  idle 
robots  per  second,  time  delay 
per  inspection,  workload 

•  Users  performed  more  overall  inspections  with  four  and  eight 
UGVs,  but  also  had  more  idling  time  and  efficiency  loss. 

Trouvain  et 
al.,  2003 

One,  two,  or  four 
UGVs 

Time  to  navigation  task 
completion,  deviation  from 
optimal  path  (errors) 

•  Users  of  one  UGV  had  optimal  navigation  performance. 

•  Two  and  four  UGV  users  were  equal  in  performance. 

Wickens  et 
al.,  2003 

One  or  two  UAVs 

Tracking  error,  system  failure 
RT  and  errors,  targeting  time 
and  errors 

•  One  UAV  user  demonstrated  faster  reaction  and  targeting 
times. 

•  Errors  in  tracking  and  system  failure  detections  were 
equivalent. 
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Table  2.  Summary  of  studies  manipulating  single  task  demands. 


Study 

Manipulation 

Criteria  (by  Task  Type) 

Key  Findings 

Chen  and  Joyner, 
2009 

Dense  or  sparse 
targeting  area 

Targeting  errors 

•  Errors  increased  with  more  distractor  objects  around  the 
target. 

•  In  difficult  conditions,  manual  control  outperformed 
semi-autonomy. 

Cosenzo  et  al., 
2006 

No.  of  targets  to 
photo  w/  UAV 

Errors  in  targeting,  RT  to 
navigational  decisions 

•  As  no.  of  targets  increased,  targeting  errors  and  RT  to 
navigational  stimuli  increased. 

Darken  and 
Cervik,  1999 

Ocean  or  urban 
environment 

Efficiency  in  navigation 

•  Users  had  stronger  performance  in  visually  sparse  ocean 
environments  than  in  complex  urban  environments, 
regardless  of  the  type  of  camera. 

Draper  et  al., 
1991 

No.  of  alerts  needing 
responses 

Errors  and  RT  in  responding  to 
UAV  alerts 

•  Performance  degraded  as  system  alerts  were  more 
frequent;  no  interaction  between  condition  and  form  of 
responses  (manual  vs.  verbal). 

Folds  and  Gerth, 
1994 

Dense  or  sparse 
targeting  area 

RT  to  identify  new  threat  in 
virtual  tracking  task 

•  RT  to  emerging  threat  was  slower  in  dense  environment. 

•  Auditory  warnings  improved  RT  more  so  in  dense 
environments. 

Galster  et  al., 
2006 

No.  of  targets  to 
process 

Errors,  efficiency,  and  workload 
in  processing  targets;  RT  to 
probes 

•  Workload  differences  emerged  favoring  the  low  target 
condition. 

•  Four  UAVs  yielded  better  performance  with  more 
targets  than  six  or  eight  UAVs. 

Hardin  and 
Goodrich,  2009 

200  or  400  distractor 
targets 

Efficiency  and  errors  in  VE 
search  and  rescue 

•  No.  of  distracters  had  a  significant  effect  on  efficiency 
but  not  on  errors;  introducing  autonomy  did  not  mitigate 
this  impact. 

Hendy  et  al.,  1997 

Low,  medium,  or 
high  time  pressure 

Efficiency,  error,  and  workload 
in  air  traffic  control 

•  Performance  dropped  only  at  high  levels  of  time 
pressure. 

•  Workload  indices  increased  sharply  beyond  low  time 
pressure. 

Mosier  et  al., 
2007 

Low  or  high  levels 
of  time  pressure 

Errors  and  efficiency  in 
diagnosing  system  problem  in 
flight  simulator 

•  Adding  time  pressure  increased  pilot  efficiency  but  also 
increased  diagnosis  errors;  this  was  worsened  by  system 
information  conflicts. 

Murray,  1995 

Complex  or  simple 
images 

Efficiency  in  monitoring  and 
tracking  targets  in  VE 

•  Increasing  image  complexity  increased  target  detection 
time. 

•  Automated  mobility  improved  user  performance  in 
complex  conditions. 

Park  and  Wolstad, 
2000 

Size  of  destination 
for  placement 

Efficiency  and  workload  in 
object  transfer  with  robotic  arm 

•  Less  efficiency  and  higher  workload  in  conditions  with 
smaller  targets. 

•  3-D  displays  helped  performance  with  small  targets. 

Schipani,  2003 

Navigation  distance 

Workload  ratings  in  VE 
navigation 

•  Workload  increased  with  greater  distance  to  travel. 

•  Line  of  sight  with  the  operator  did  not  impact  workload. 

Sellner  et  al., 
2006 

Simple  or  complex 
images 

Efficiency  and  errors  on  task 
decision  making  (on  stimuli) 

•  Simple  displays  decreased  decision  time  but  also 
increased  errors. 

•  Integrative  presentations  reduced  the  time  penalty  in 
complex  displays. 

Watson  et  al., 
2003 

Distance  in  3-D 
placement 

Errors,  efficiency,  and  usability 
on  virtual  object  placement 
(helmet-mounted  display 
[HMD]) 

•  Placement  errors  increased  with  greater  distances  in 
addition  to  task  completion  time;  poor  frame  rate 
worsened  this  effect. 

Witraer  and 
Kline,  1998  (two 
studies) 

Dense  or  sparse 
environment 

Errors  in  distance  estimation  for 
VE 

•  More  complex  environments  did  not  impact  virtual 
distance  estimation. 

Yeh  and  Wickens, 
2001 

Dense  or  sparse 
environment 

Errors,  workload,  and  trust  on 
target  detection 

•  Users  had  better  performance  with  low  (vs.  high) 
environmental  detail. 

•  With  reliably  cued  targets,  the  impact  of  visual  detail 
was  reduced. 

Yi  et  al.,  2006 

No.  of  targets  to 
photo 

Errors  and  SA  in  targeting  with 
UAV 

•  Accuracy  and  SA  decreased  with  more  mission  targets. 

•  Workload  conditions  were  not  counter-balanced  for 
practice  effects. 
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Increasing  task  difficulty  generally  decreased  a  variety  of  performance  indices  across  a  variety  of 
tasks.  This  would  suggest  that  task  demands  are  not  criterion-dependent,  as  with  control  of 
multiple  platforms.  Based  on  this  review,  HRI  task  performance  is  particularly  susceptible  to 
strains  on  visual  resources.  This  is  evidenced  by  several  relationships  reported  in  studies.  First, 
users  had  better  performance  in  visually  sparse  or  simple  environments  (e.g.,  Chen  and  Joyner, 
2009;  Darken  and  Cervik,  1999).  Second,  studies  that  manipulated  visual  features  to  mitigate 
workload  reported  a  positive  impact  from  their  interventions  (e.g.,  Park  and  Woldstad,  2000; 

Yeh  and  Wickens,  2001).  Third,  as  visual  demands  were  increased,  audio  feedback  tended  to 
improve  operator  performance  (e.g.,  Folds  and  Gerth,  1994).  Because  HRI  tasks  are  limited  to 
interface  and  camera  views,  the  visual  channel  will  inherently  receive  greater  strain  than  most 
other  resource  channels.  Based  on  the  evidence  presented  here,  one  may  remove  these  demands 
by  reducing  visual  information  (e.g.,  using  integrative  displays  or  lower  environmental  detail)  or 
by  offloading  infonnation  to  other  sensory  channels  (e.g.,  tactile,  auditory). 

2.4  Conclusions 

The  purpose  of  the  current  section  was  to  examine  the  available  research  and  determine  guiding 
principles  for  managing  workload  in  HRI.  Specifically,  this  section  examined  manipulations  of 
robot  number  and  task  demands  separately,  highlighting  results  by  task  and  criteria.  Results 
indicated  that  control  of  multiple  platforms  increases  user  productivity  to  the  detriment  of  RT, 
accuracy,  and  workload.  Results  from  manipulations  of  task  demands  suggested  that  visual 
strains  are  the  primary  limitation  to  teleoperator  perfonnance. 

Results  of  this  section  yield  several  guiding  principles  for  managing  workload  in  teleoperators. 
First,  the  benefit  from  controlling  multiple  platforms  should  be  explicitly  weighed  against  the 
deterioration  of  other  performance  indices.  Researchers  and  practitioners  need  to  determine 
which  criterion  is  more  critical  to  task  success,  which  may  vary  according  to  the  situation.  For 
example,  overall  productivity  may  be  the  critical  outcome  for  search-and-rescue  operations, 
whereas  teleoperators  disabling  explosives  are  more  likely  concerned  with  correct  actions. 
Second,  workload  from  multirobot  management  may  be  alleviated  through  the  introduction  of 
practical  and  reliable  automation,  and  attention  management  may  be  facilitated  by  audio  alerts. 
Results  from  task  demand  manipulations  suggest  that  HRI  tasks  tend  to  strain  visual  resources, 
such  that  increasing  visual  demands  subsequently  increases  workload  and  reduces  performance. 
We  recommend  that  researchers  and  practitioners  consider  and  limit  these  demands.  Different 
approaches  can  reduce  these  visual  demands,  including  a  change  in  the  display  type  and/or  the 
use  of  other  sensory  channels  to  provide  task  feedback  (e.g.,  use  of  audio  or  tactile  cues). 

The  primary  limitation  of  this  section  is  that  it  does  not  provide  a  quantitative  assessment  or 
meta-analysis  of  the  examined  relationships.  Although  a  quantitative  review  is  desirable, 
existing  studies  are  few  in  number,  inconsistent  in  operational  definitions,  and  lack  needed 
statistics  to  permit  a  meta-analysis  at  this  time.  The  HRI  literature  would  also  benefit  from 
further  investigations  of  workload  mitigation,  such  as  the  use  of  multimodal  feedback  and/or 
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automation.  Existing  studies  in  this  area  are  promising  but  too  few  in  number  to  provide  a 
complete  understanding  of  the  advantages  and  disadvantages  of  these  strategies.  In  conclusion, 
the  purpose  of  this  section  was  to  guide  future  research  by  synthesizing  the  existing  literature  on 
teleoperator  workload  for  a  range  of  HRI  tasks  and  criteria. 


3.  Autonomy  and  Automation  Reliability  in  Human-Robot  Interaction: 
A  Qualitative  Review 


The  effectiveness  and  reliability  of  automation  aids  are  critical  topics  in  the  area  of  HRI.  As 
more  tasks  are  subsumed  by  robots  and  autonomous  systems,  it  is  important  to  examine  the 
relationships  between  these  entities  and  their  human  operators.  Research  to  date  has  covered 
various  manipulations  of  autonomy,  but  this  broad  body  of  research  needs  focus  and  consistency. 
The  current  study  presents  a  qualitative  overview  of  research  regarding  levels  and  reliability  of 
autonomy/control  and  the  effects  they  have  on  important  HRI-relevant  outcome  variables. 

Results  indicate  that  autonomy  and  automation  aids  operate  uniquely  for  different  tasks,  and  that 
there  are  many  complex  factors  that  can  affect  not  only  performance  but  also  usability, 
confidence,  and  safety.  Unresolved  issues  in  the  field  and  challenges  and  opportunities  for  future 
research  are  also  presented. 

3.1  Introduction 

Robots  and  automated  systems  are  now  intertwined  more  than  ever  in  our  everyday  lives.  Robot 
and  automated  system  operators  often  interact  with  these  tools  as  they  would  with  human 
coworkers.  As  we  move  toward  more  seamless  and  transparent  interactions  between  humans  and 
robotic  entities,  it  becomes  increasingly  important  to  understand  how  these  human  operators  and 
systems  can  optimally  perform  with  the  help  of  automation  technology. 

One  purpose  of  increased  automation  is  to  lower  the  operator  workload  by  taking  on  additional 
tasks  without  prompting  the  operator  for  commands.  Empirical  research  in  the  areas  of  HRI  and 
automated  systems,  however,  has  discovered  more  complex  relationships  between  the  human 
operator,  the  automated  agent,  and  performance.  The  majority  of  research  falls  into  two  broad 
categories:  level  of  autonomy/control  (LOA)  and  automation  aid  reliability.  Research  on  levels 
of  autonomy/control  focuses  on  investigating  outcomes  when  the  balance  of  control  between 
human  and  autonomous  agent  is  manipulated.  Cueing  and  automation  reliability  research 
focuses  on  manipulating  the  accuracy  and  frequency  of  automation  aids  in  the  control  of  robots 
or  complex  semi-autonomous  systems. 
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3.1.1  Levels  of  Autonomy/Control  (LOA) 

In  many  applications,  human  control  of  complex  systems  has  been  slowly  replaced  by  robots  and 
automated  systems.  Advances  in  technology  increasingly  allow  human  operators  to  simply 
observe  a  process  or  be  minimally  involved  through  safety  checks  or  a  simple  button  press. 

While  technologies  and  automation  have  fully  replaced  humans  in  many  tasks,  a  multitude  of 
situations  still  exist  in  which  humans  and  semi-autonomous  systems  or  robots  must  work 
together.  In  some  instances  this  cooperation  stems  from  a  lack  of  technology  to  fully  subsume  a 
human  operator’s  role  (e.g.,  air-traffic  control).  In  other  situations,  an  autonomous  system  is 
technologically  capable  of  fully  performing  a  task,  but  legal  or  safety  restrictions  exist  that 
require  a  human  operator  (e.g.,  hazardous  materials  handling). 

Research  in  LOA  focuses  on  manipulating  either  the  amount  of  control  a  human  operator  has 
over  an  automatic  process  or  the  amount  of  autonomy  a  robotic  entity  or  system  has  from  a 
human  operator.  Existing  research  in  this  area  falls  in  one  of  two  general  design  categories: 
human  teleoperation  of  one  or  more  robots  and  human  supervision  and  control  of 
semi-autonomous  systems. 

Researchers  have  long  noted  that  the  most  common  implementation  of  automation  in  an  applied 
setting  involves  allocating  as  much  responsibility  to  an  automated  system  as  is  technologically 
possible  (Kaber  et  al.,  2000).  If  multiple  tasks  can  be  automated  and  supervised  by  a  single 
operator,  having  a  separate  employee  perform  each  task  is  impractical.  The  resulting 
consequence  is  that  operators  can  only  observe  the  process  without  any  system  interaction.  They 
are  left  essentially  “out  of  the  loop.”  Since  most  automation  is  inherently  imperfect,  failures  of 
automation  or  unsuccessful  collaboration  can  lead  to  performance  decrements  worse  than  if  the 
operator  was  completing  the  task  without  the  use  of  any  autonomous  aid  (Endsley  and  Kaber, 
1999;  Muthard  and  Wickens,  2003). 

3.1.2  Automated  Aid  Reliability 

While  research  on  LOA  tends  to  focus  on  system-level  automation,  automation  does  not  always 
occur  in  every  aspect  of  a  given  task.  Much  research  exists  exploring  the  use  of  automated  aids 
and  decision-making  support  systems  that  augment  and  assist  a  human  operator-controlled  task. 

Automation  aids  typically  are  used  to  alert  a  human  to  important  information  that  is  either 
necessary  for  task  completion  or  helpful  in  completing  a  task  more  efficiently  or  effectively. 
Some  aids  simply  present  the  user  with  raw  information  in  a  more  salient  form,  such  as  an 
auditory  warning  (Wickens  et  al.,  2003).  Other  automated  aids  are  more  sophisticated  and 
aggregate  different  sources  of  information  to  make  a  recommendation  or  alert  to  the  user  by  way 
of  complex  computer  algorithms  (Wickens  et  al.,  2005).  Existing  research  in  this  area  falls  in 
one  of  three  general  design  categories:  production  systems,  targeting  tasks,  and  diagnostics 
monitoring. 
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More  complex  aids  aggregate  raw  data  and  present  recommendations  or  alerts  to  operators.  For 
these  types  of  aids,  imperfect  calculations  can  lead  to  misleading  information  or  incorrect 
decisions.  These  automation  imperfections  can  take  the  form  of  either  false  alarms  or  misses 
(Dixon  and  Wickens,  2006).  While  these  automation  imperfections  can  be  attributed  to  a  myriad 
of  causes  (e.g.,  low-quality  video  feed,  raw  data  inaccuracy),  they  are  commonly  associated  with 
thresholds  set  in  the  decision-making  computer  algorithms  that  calculate  the  raw  data  and 
produce  alerts  and  cues.  In  many  cases,  these  thresholds  can  be  adjusted  to  make  an  automated 
aid  more  or  less  prone  to  false  alarms  or  misses  (Levinthal  and  Wickens,  2006;  Yeh  and 
Wickens,  2001). 

3.1.3  Purpose 

The  purpose  of  this  section  is  to  explicate  the  literature  on  LOA  and  automated  aid  reliability  as 
it  relates  to  HRI.  Specifically,  this  study  examines  the  trends  present  in  these  related  streams  of 
research  to  date  and  provides  guidelines  for  future  integrative  research.  From  a  practical 
perspective,  this  investigation  seeks  to  spur  critical  thinking  in  settings  where  these  technologies 
are  used  with  the  hope  that  improvements  in  the  design,  performance,  and  usability  of  robots  and 
other  autonomous  systems  will  result.  Unlike  a  traditional  research  design  or  meta-analytical 
investigation,  this  study  aims  to  qualitatively  integrate  the  dispersed  research  on  these  topics  in 
an  effort  to  encourage  more  standardized  empirical  investigation  so  that  future  quantitative 
meta-analyses  are  a  feasible  option  for  aggregating  the  data. 

3.2  Method 

3.2.1  Literature  Search 

The  literature  search  included  a  thorough  exploration  of  published  studies,  conference 
proceedings,  and  technical  reports  from  a  variety  of  scientific  and  military  electronic  databases, 
including  ACM,  DTIC,  and  IEEE.  References  from  a  comprehensive  HRI  review  (Chen  et  al., 
2007)  as  well  as  obtained  studies  were  also  checked  for  eligibility.  Finally,  a  hand  search  was 
conducted  on  the  following  journals  and  conference  proceedings  for  the  past  5  years:  Human 
Factors,  Human-Robot  Interaction,  Human  Computer  Interaction,  Presence,  and  IEEE. 

3.2.2  Inclusion  Criteria  and  Procedure 

To  be  included  in  the  present  review,  an  article  was  required  to  report  a  study  that  experimentally 
investigated  different  levels  of  control/autonomy  present  in  an  autonomous  system  or  robotic 
control,  or  explored  the  reliability  and  accuracy  of  cuing  or  decision-making  aids  in  these 
scenarios. 

3.2.2. 1  Criteria.  In  the  HRI  literature,  researchers  have  measured  user  effectiveness  and  general 
performance  in  a  myriad  of  ways.  In  this  section,  the  most  common  operations  were  selected  for 
examination.  In  order  to  be  eligible  for  inclusion,  a  study  had  to  include  at  least  one  of  the 
following  criteria:  ER,  efficiency,  RT,  SA,  or  perceived  workload. 


10 


3. 2.2. 2  Study  Coding.  Before  coding,  raters  reviewed  the  variables  of  interest,  constructed  a 
coding  sheet  to  reflect  them,  and  used  it  to  screen  for  article  eligibility.  Five  studies  identified 
for  eligibility  were  then  selected  and  coded  by  all  raters  to  examine  validity  and  agreement. 
Based  on  acceptable  agreement,  one  out  of  five  raters  coded  each  study  on  the  following  six 
dimensions:  (1)  article  characteristics,  (2)  sample  characteristics,  (3)  research  design,  (4) 
independent  variables,  (5)  task  type  and  apparatus,  and  (6)  outcome  measured. 

3. 2. 2. 3  Analyses.  As  research  accumulates  in  this  area,  meta-analytic  methods  may  be  applied 
to  assess  the  quantitative  impact  LOA  and  automated  aid  reliability  have  on  perfonnance 
outcomes.  Existing  studies,  however,  are  few  in  number  and  inconsistent  in  the  operations  of 
study  variables.  As  a  result,  the  present  analyses  consist  of  qualitative  descriptive  summaries  of 
the  obtained  articles. 

3.3  Results 

Table  3  presents  the  studies  included  for  analysis  with  regards  to  LOA.  Included  in  the  table  is  a 
brief  summary  of  the  specific  manipulation  (IV-independent  variable)  and  task  design,  criterion 
(DV-dependent  variable)  measurement,  and  guiding  principles  for  each  study.  Table  4  presents 
the  same  summarized  information  for  studies  included  for  analysis  covering  automated  aid 
reliability. 

Immediately  evident  in  the  qualitative  analysis  of  the  included  studies  is  the  dependency  of 
results  on  the  experimental  task  employed  in  each  study.  For  example,  in  some  tasks  such  as 
search  and  rescue,  automation  led  to  improvements  in  performance  across  the  board  (Luck  et  al., 
2006).  In  other  tasks  such  as  an  air-traffic  controlling  scenario,  the  effect  of  automation  is  more 
complex  and  performance  benefits  vary  (Endsley  and  Kaber,  1999).  Similarly,  automated  aid 
reliability  research  reveals  that  for  some  tasks,  imperfect  automation  leads  to  large  performance 
decrements  (Rovira  et  al.,  2007),  while  for  other  tasks  it  leads  to  a  reliance  on  other  strategies  to 
successfully  complete  a  task  with  only  marginal  effects  on  perfonnance  (Meyer  et  al.,  2003). 
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Table  3.  Summary  of  studies  examining  LOA. 


Study 

Manipulation  and  Design 
(IV) 

Criteria  Measurement 
(DV) 

Key  Findings 

1 

Manual  robot  control  vs. 
shared  control  with  robot 
navigating  and  operator 
focused  on  target  ID. 

Performance  (no.  of  targets 
correctly  identified) 

For  novice  robot  operators,  performance 
is  increased  with  the  use  of  a  semi- 
autonomous  (shared  control)  navigation 
aid. 

4 

Ten  LOAs  in  monitoring, 
generating,  selecting,  and 
implementing  between  human 
operator  and  automated 
system. 

Performance  (no.  of  points 
earned  in  targeting 
simulation,  missed  targets, 
and  collisions) 

LOAs  that  combine  human  generation  of 
options  and  automated  implementation 
produce  superior  results  during  normal 
system  operations;  joint  decision  making 
(human/system  collaboration)  is 
detrimental  to  performance. 

5 

User-controlled  vs.  sensor- 
driven  control  of  secondary 
independent  UGV  camera. 

Performance  (no.  of  targets 
identified,  time  spent  in 
visual  target  inspection) 

Sensor-driven  control  is  better;  automatic 
gaze  redirection  of  a  UGV  camera  helps 
in  close-up  identification  of  objects  in  a 
search  task. 

6 

Five  LOAs  and  five  schedules 
of  automation  (automation  on 
then  off  for  a  specified  time) 
for  system  control. 

Performance  (no.  of  errors, 
errors  in  secondary  task), 
workload,  SA 

When  automation  is  cycled  on  and  off, 
performance  is  best  when  the  human 
operator  develops  a  strategy  that  is 
implemented  automatically;  workload 
correlated  with  secondary  task 
performance. 

7 

Five  LOAs  range  from  simple 
support  to  full  automation. 

Performance  (errors, 
efficiency),  workload,  SA 

Increased  automation  leads  to 
performance  improvements  and  reduces 
human  operator  subjective  workload,  but 
also  reduces  SA  for  some  system 
functions. 

9 

No  aid,  veto-only  aid  (stop  to 
avoid  damage),  or  semi- 
autonomous  aid  (adjusts  course 
away  from  obstacles)  UGV 
control. 

Usability 

Users  may  struggle  to  adapt  strategies 
around  autonomous  agent  control,  and 
steering/navigation  trouble  may  arise  if 
the  operator  is  unable  to  adjust. 

11 

LOA  and  latency  for  search- 
and-rescue  UGV. 

Performance  (errors,  time 
to  completion),  usability 

Increased  automation  leads  to 
performance  improvements  in  both  errors 
and  time.  It  also  acts  as  a  buffer  from  the 
negative  effects  of  control  latency. 

17 

LOA  for  team  of  three  UGVs: 
full  autonomy,  mixed  control, 
full  control. 

Performance  (targets 
identified),  behavior, 
usability 

When  controlling  multiple  UGVs,  a 
mixed  control  paradigm  with  both  manual 
control  of  robots  as  well  as  some 
cooperative  automation  provided  best 
performance;  controllers  who  switched 
attention  between  robots  more  frequently 
performed  better  in  manual  and  mixed 
control  scenarios. 

18 

Single  or  dual  UAV  control 
with  no  aid,  auditory  aid,  or 
flight  path  tracking 
automation. 

Performance  (errors, 
efficiency,  RT) 

Automation  aid  helped  improve  target 
identification  tasks  more  when  operating 
multiple  UAVs  vs.  single  UAV  control. 
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Table  4.  Summary  of  studies  examining  automated  aid  reliability. 


Study 

Manipulation  and  Design 
(IV) 

Criteria  Measurement 
(DV) 

Key  Findings 

3 

UAV  targeting  task  with 
automation  aid  of  varying 
accuracy  and  reliability. 

Performance  (errors, 
response  time),  SA 

False  alarm-prone  automation  leads  to  a 
decreased  use  of  aids  and  ignoring  raw  data; 
imperfect  automation  leads  to  better 
detection  of  a  target  miss;  high  workload 
polarizes  these  effects. 

7 

Five  LOAs  range  from  simple 
support  to  full  automation; 
normal  operation  or  unexpected 
automation  failure. 

Performance  (errors,  time  to 
recovery),  workload,  SA 

Increased  automation  leads  to  performance 
improvements  and  reduces  human  operator 
subjective  workload;  in  automation  failure, 
lower-level  LOAs  with  more  human  control 
results  in  the  best  performance  due  to 
increased  SA. 

10 

Search  simulation  with  two  or 
four  UAVs  controlled  with  no 
automation  aid,  90%  reliable 
aid,  60%  reliable  aid  prone  to 
false  alarms,  or  60%  reliable  aid 
prone  to  target  misses. 

Performance  (efficiency,  RT) 

There  is  a  substantial  cost  to  efficiency  as 
users  control  more  UAVs;  automation  aids 
that  provide  more  false  alarms  are  more 
detrimental  to  user  performance  than  90% 
reliable  or  60%  reliable  automation  aids 
emphasizing  misses. 

12 

Automatic  cuing  agent  for 
quality  control  decision-making 
task:  none,  low,  or  high  validity; 
high  vs.  low  overall  automation. 

Performance  (errors) 

Fligher  levels  of  automation  resulted  in  more 
reliance  on  cues;  no  performance  differences 
between  automation  types  for  valid  cues,  but 
lower  automation  outperformed  higher 
automation  with  less  valid  cues. 

14 

Flight  simulation  with  or 
without  reliable  attention 
guidance  automation. 

Performance  (errors, 
efficiency),  subjective 
confidence/trust 

When  automation  of  flight  plan  selection  is 
used,  pilots  were  more  likely  to  ignore 
changes  in  the  environment  making  the  flight 
unsafe  after  selection;  automation  is  best  in 
selection,  but  not  necessarily 
implementation/monitoring. 

15 

Command  and  control  targeting 
simulation  with  various  levels  of 
information  and/or  decision¬ 
making  automation. 

Performance  (accuracy,  RT), 
workload,  subjective 
confidence/trust 

Imperfect  information  automation  and 
decision-making  automation  are  both 
detrimental  to  performance;  major 
component  of  failures  is  the  lack  of  operator 
access  to  raw  information  and  complacency. 

16 

One,  two,  or  four  UAVs 
controlled  manually  or  with 
95%  or  100%  accurate 
automated  or  by-consent 
decision-making  aid. 

Performance  (errors, 
efficiency) 

Management-by-consent  automation  aid 
resulted  in  best  performance  as  it  left 
operators  “in  the  loop”  but  was  scalable  to 
increases  in  workload  (more  UAVs). 

19 

UAV  simulation  with  automated 
diagnostics  information:  100% 
accurate,  60%  reliable  with  false 
alarms,  60%  reliable  with 
misses,  or  manual  control. 

Performance  (errors, 
efficiency),  behaviors 

Increased  misses  by  automation  leads  to 
decrease  in  concurrent  task  performance 
driven  by  reallocation  of  visual  attention 
while  increased  false  alarms  led  to  slower 
response  to  all  automation  alarms  and  were 
followed  by  more  time  scanning  the 
environment  (raw  data)  to  determine 
accuracy  vs.  100%  accurate  alarms. 

20 

UAV  targeting  simulation  with 
automated  75%  or  100% 
reliable  cuing  for  some  targets. 

Performance  (accuracy), 
workload,  subjective 
confidence/trust 

Partially  reliable  cuing  increases  false  alarms 
and  eliminates  overall  performance  benefits 
of  cuing;  cuing  draws  attention  toward  cued 
target  results  in  other  targets  being 
overlooked. 

13 


3.3.1  Overall  Analysis  of  LOA 

It  is  clear  that  some  amount  of  automation  does  in  fact  increase  overall  performance  for  primary 
tasks.  This  is  true  for  novice  robot  operators  (e.g.,  Hughes  and  Lewis,  2005)  and  UGV  and  UAV 
operators  (e.g.,  Wang  and  Lewis,  2007),  as  well  as  in  targeting  simulations  (Kaber  and  Endsley, 
2003).  In  at  least  some  conditions,  automation  can  lead  to  significant  problems,  especially  if  the 
operator  is  unable  to  access  raw  data  (Rovira  et  ah,  2007)  or  does  not  know  how  to  regain  control 
of  a  robot  (Krotkov  et  ah,  1996). 

While  the  notion  that  all  technology  available  be  utilized  in  an  automated  system  seems  sensible, 
our  analyses  found  a  trend  toward  the  opposite.  One-third  of  included  studies  utilized  a  version 
of  Endsley  and  Kaber’ s  (1999)  10-level  LOA  taxonomy.  This  taxonomy  separates  tasks  into 
four  roles:  monitoring,  generating,  selecting,  and  implementing.  Each  level  in  the  taxonomy 
assigns  either  a  human  operator,  a  computer  (autonomous  agent),  or  both  to  control  each  role. 
Results  of  studies  using  the  taxonomy  all  indicate  that  performance  is  optimal  when  the  human 
operator  generates  potential  actions  and  selects  the  desired  action;  it  is  then  automatically 
implemented  by  the  system  (e.g.,  Kaber  and  Endsley,  2003).  In  these  scenarios,  an  increase  in 
task  or  process  automation  reduces  subjective  workload  and  SA  of  the  operator  (Kaber  et  ah, 
2000). 

3.3.2  Overall  Analysis  of  Automated  Aid  Reliability 

Across  all  included  studies,  the  reliability  and  accuracy  of  automated  aids  has  a  significant  effect 
on  performance.  Automation  with  a  high  tendency  for  false  alarms  results  in  the  greatest 
detriment  to  perfonnance.  Operators  experiencing  automated  aids  with  a  high  level  of  false 
alarms  tend  to  use  and  respond  to  aids  less  frequently  and  tend  to  ignore  raw  data  in  targeting 
tasks  (Dixon  and  Wickens,  2006).  In  a  scenario  when  operators  were  required  to  respond  to 
imperfect  automated  diagnostic  aids,  responses  were  slower  to  all  automation  aids  if  false  alarms 
were  common,  and  raw  data  was  used  more  often,  reducing  overall  efficiency  (Wickens  et  ah, 
2005).  This  is  in  contrast  to  the  Dixon  finding,  which  may  be  attributable  to  the  false  alarms, 
although  more  research  would  be  welcome  to  clarify  the  apparent  differences.  When  raw  data  is 
not  available  to  the  operator  in  imperfect  automation  (e.g.,  false  alarm  prone)  conditions, 
complacency  led  to  further  decreases  in  performance  (Rovira  et  ah,  2007).  In  nearly  all  cases, 
when  workload  was  increased,  the  overall  detrimental  effects  of  imperfect  automation  were 
polarized  (e.g.,  Levinthal  and  Wickens,  2006). 

Imperfect  automation  aids  also  influence  performance  through  the  reallocation  of  attention.  This 
can  occur  in  several  ways,  the  simplest  being  when  an  incorrectly  activated  alert  or  cued  target  is 
attended  to  by  an  operator  while  an  actual  target  or  event  goes  unnoticed  (e.g.,  Yeh  and  Wickens, 
2001).  Additionally,  automation  can  lead  operators  to  ignore  raw  data  for  a  portion  of  a  task  that 
has  become  automated  (Muthard  and  Wickens,  2003),  essentially  assuring  a  problematic 
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situation  will  arise  should  automation  fail.  In  line  with  the  findings  of  LOA  research,  automated 
aid  reliability  research  fully  supports  the  notion  that  access  to  raw  data  and  avoidance  of 
situations  where  operators  are  “out  of  the  loop”  are  critical  to  performance  (e.g.,  Ruff  et  ah, 
2002). 

3.4.  Discussion 

3.4.1  Guiding  Principles 

When  the  included  studies  are  looked  at  as  a  whole,  some  general  guiding  principles  arise  from 
the  current  status  of  research  in  both  LOA  and  automated  aid  reliability.  The  most  important 
message  that  current  research  sends  is  that  technology  should  not  be  utilized  simply  because  it  is 
available.  Until  the  relationships  between  human  operators  and  a  given  technology  are 
understood  as  they  relate  to  perfonnance,  the  application  of  that  technology  to  work  should  be 
limited  to  making  recommendations.  This  review  of  research  also  sheds  light  on  the  fact  that 
there  are  many  forms  of  automation,  all  of  which  have  unique  effects  on  different  dimensions  of 
performance,  and  operate  in  a  different  way  for  varying  tasks.  This  is  an  important  practical 
implication  and  highlights  the  need  for  careful  application  of  research  findings  to  real-world 
contexts.  Last,  keeping  operators  “in  the  loop”  with  access  to  all  available  data  is  imperative  to 
successful  interactions.  Until  automation  is  perfect,  a  human  operator  will  always  need  to  know 
how  to  recover  successfully  from  failure  by  completing  a  task  the  old-fashioned  way,  without 
any  help  from  defective  automation. 

3.4.2  Unresolved  Issues  and  Future  Directions 

While  research  on  LOA  and  automated  aid  reliability  has  covered  many  important  issues 
surrounding  the  interaction  of  humans  and  autonomous  systems  and  agents,  there  is  room  for 
more  investigation.  An  area  that  has  been  largely  overlooked  in  current  streams  of  research  is 
the  differences  in  the  experience  levels  of  operators.  Whether  they  are  UAV  pilots  or  quality- 
control  supervisors,  current  research  has  largely  ignored  the  fact  that  experience  may  play  a  large 
role  in  the  interactions  operators  have  with  automation.  Some  research  has  looked  at  novice 
operators  (Bruemmer  et  ah,  2004),  but  empirical  investigations  comparing  novices  to 
experienced  operators  are  needed.  For  example,  a  novice  operator  will  likely  respond  very 
differently  to  an  automation  failure  than  an  experienced  employee  who  knows  the  background 
processes  behind  the  automation. 

Keeping  operators  “in  the  loop”  with  the  task  they  are  completing  is  another  important 
determinant  of  performance  in  many  scenarios.  Research  on  interface  design  could  greatly 
facilitate  this  by  investigating  display  interfaces  that  aggregate  data  and  present  automation  aids, 
but  also  give  operators  intuitive  access  to  raw  data  should  they  need  it.  An  existing  problem  with 
operators  who  do  have  access  to  raw  data  is  the  additional  workload  associated  with  accessing  it. 
If  the  information  was  easily  available  and  intuitively  connected  to  the  related  automation  within 
an  interface,  these  two  problems  may  be  resolved. 
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Last,  as  technology  allows,  adaptive  automation  schemes  should  be  investigated  as  a  potential 
buffer  to  the  effects  of  different  operators  or  tasks.  These  systems  could  alter  their  own  LOA 
based  on  output  performance  or  operator  responses  to  automation  aids.  For  example,  in  a 
semi-autonomous  quality-control  system,  perfonnance  data  could  be  fed  back  into  the  system, 
which  could  then  alter  the  LOA.  If  a  given  operator  is  experienced  and  performs  better  with 
more  control  of  the  system,  he  or  she  could  then  be  granted  more  control.  On  the  other  hand,  a 
novice  operator  might  benefit  from  either  higher  levels  of  automation  when  output  efficiency  is 
important  or  from  low  levels  of  automation  for  training  purposes.  Similarly,  autonomous 
systems  or  agents  might  be  able  to  predict  failures  and  correct  them  before  the  human  operator  is 
even  aware  of  a  problem.  Researchers  must  stay  one  step  ahead  of  the  application  of  new 
technologies  in  order  to  investigate  how  best  to  apply  the  advances  in  practical  settings. 

3.4.3  Summary  and  Conclusion 

The  primary  finding  in  this  study  is  the  general  lack  of  quantitative  analysis  data  in  the  fields  of 
LOA  and  automated  aid  reliability.  This  is  mainly  a  consequence  of  both  the  limited  number  of 
available  empirical  investigations  and  the  extreme  diversity  in  variable  operations  and 
measurement  in  the  existing  literature.  For  example,  ER  is  measured  in  numerous  ways, 
including  points  acquired,  targets  identified,  and  collisions  avoided.  While  these  data  infonn  us 
about  the  task-specific  relationships  they  examine  independently,  they  cannot  be  sensibly 
integrated  by  traditional  meta-analytical  means.  This  discovery  brings  to  light  the  need  for 
consistency  and  cooperation  among  research  in  these  areas.  More  general  investigations  are 
needed  that  can  be  flexibly  applied  to  more  tasks  (Miller  and  Parasuraman,  2003),  and  common 
methods  must  be  agreed  upon  so  that  the  findings  can  be  better  utilized  by  a  wider  audience  in 
practice. 

The  present  study’s  analysis  of  LOA  and  automated  aid  reliability  was  bom  of  a  larger 
investigation  of  HRI.  For  this  reason,  the  literature  search  and  resulting  studies  focus  only  on 
these  topics  as  they  relate  to  HRI.  The  benefit  of  this  methodology  is  an  in-depth  focus  on  the 
topics  as  they  apply  to  HRI.  While  the  consequences  may  be  the  exclusion  of  some  important 
non-HRI  work  in  the  areas  of  LOA  and  automated  aid  reliability,  this  focus  exemplifies  the  need 
for  consistency  in  these  areas  of  research. 

As  technology  and  automation  processes  continue  to  alter  the  way  people  interact  with  each 
other  and  machines,  researchers  must  clarify  how  best  to  use  these  modern  advances.  Common 
sense  may  dictate  that  we  use  whatever  technology  is  available,  but  careful  investigations  of  the 
application  of  automation  are  important  to  guarantee  optimal  use  of  these  complex  and  often 
expensive  tools. 
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4.  Effectiveness  of  Visual  Devices  in  Human-Robot  Interaction: 

A  Qualitative  Review 

Visual  devices  employed  in  HRI  have  matured  from  mere  tools  to  integrated  technological 
extensions  of  the  body.  Research  must  follow  suit  and  strive  to  create  a  programmatic 
framework  to  address  the  lack  of  common  operations  in  previous  literature.  The  current 
qualitative  analysis  organizes  five  commonly  manipulated  visual  modality  devices  into  three 
categories  based  on  MRT  (Wickens,  1980,  2002).  This  endeavor  synthesizes  existing  research  to 
create  a  needed  foundation  for  future  research.  Analytic  results  suggest  that  employing  robot- 
enhanced  visual  systems  aids  operators’  task  performance  when  the  visual  device  is  matched 
with  the  operators’  task.  More  importantly,  results  indicate  that  our  current  understanding  of 
visual  modalities  is  rudimentary  and  full  of  caveats. 

4,1  Introduction 

HRI  exemplifies  the  use  of  technology  as  a  “force  multiplier,”  in  military  tenns,  which  increases 
the  physical  and  mental  abilities  of  operators  beyond  what  was  previously  feasible.  This  allows 
operators  to  outsource  cognitively  taxing  but  predictable  tasks  to  interactive  technology  systems. 
Research  has  yet  to  reach  a  methodological  and  technical  consensus,  however,  on  how  to 
maximize  HRI  given  quickly  advancing  technology,  nor  has  it  achieved  a  systematic,  coherent 
approach  to  studying  visual  modalities.  The  purpose  of  this  qualitative  review  is  to  provide  a 
summary,  organized  by  a  proposed  framework,  of  the  current  state  of  HRI  visual  modality 
research.  This  review  will  also  highlight  inconsistencies  among  variable  manipulations  and 
operations  in  an  effort  to  guide  future  research.  The  need  for  a  more  systematic  research  agenda 
should  not  be  dismissed  given  technology’s  ever-growing  pervasiveness  in  military  and  civilian 
life. 

The  present  review  identifies  common  themes  within  HRI  literature  addressing  technology’s 
enhancement  of  visual  perception.  According  to  Wickens’  (1980)  MRT,  some  tasks  can  be 
performed  in  parallel  while  others  cannot  due  to  mental  workload  constraints.  Notably,  tasks 
requiring  different  perceptual  resources  (e.g.,  simultaneously  perfonning  an  auditory  and  visual 
task)  can  typically  be  perfonned  together,  whereas  two  tasks  straining  the  same  modality  will 
mutually  interfere  with  task  perfonnance  (e.g.,  perfonning  two  visual  tasks  simultaneously). 
Perceptual  and  cognitive  overload  due  to  the  latter  can  occur  within  and  between  modalities. 

The  present  study  applies  MRT  solely  to  visual  modalities  and  perception.  Resources  can  be 
constrained  by  a  variety  of  factors,  including  time,  cognitive  processing,  and  contextual  factors. 
The  framework  proposed  in  this  study  is  built  upon  these  three  resource  constraints  (Wickens, 
2002). 
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Current  manipulations  of  visual  modalities  in  the  HRI  literature  can  be  organized  into  five 
categories:  FR,  latency/time  delay,  SS  and  MS  visual  cues,  FOV,  and  camera  perspective.  This 
review  organizes  these  five  manipulations  into  three  conceptual  dimensions  (figure  1):  time 
resources,  contextual  resources,  and  visual  processing  resources. 

I  HRI  Visual  Modalities  I 


Time  Resources  Visual  Processing  Resources  Contextual  Resources 


Frame  Rate  Latency  Stereoscopic  v.  Monoscopic  Field  of  Vision  Camera  Perspective 


Figure  1.  Visual  modality. 

4.1.1  Time  Resources 

Time  resources,  including  picture  latency/time  delay  and  FR,  describe  an  approach  in  which 
time-related  HRI  system  features  are  altered.  Such  alterations  affect  an  operator’s  ability  to 
visually  integrate  multiple  screen  views  over  time.  For  example,  Luck  and  colleagues  (2006) 
manipulated  two  forms  of  time  resources:  the  time  delay  between  a  camera  display  and  its 
operator’s  teleoperation  of  a  UGV  along  with  whether  the  latency  was  variable  or  consistent  over 
trials. 

FR  and  latency  are  frequently  addressed  simultaneously  by  experimental  methodology  or 
operationalized  as  dependent  system  responsiveness  features  (Chen  and  Thropp,  2007;  Darken  et 
ah,  2003).  Latency,  or  time  delay,  refers  to  the  temporal  discrepancy  between  an  actual  event 
and  when  the  event  is  viewed  on  a  screen.  FR  is  defined  as  the  number  of  screen  shots  displayed 
over  time  or  the  image  refresh  rate  of  a  system  (typically  measured  as  frames  per  second). 

4.1.2  Contextual  Resources 

Contextual  resources  include  manipulations  of  FOV  and  camera  perspective  in  which  the 
information  given  by  the  environmental  perspective  is  changed  to  holistically  alter  the  extent  to 
which  operators  are  able  to  visually  perceive  their  surroundings.  Thus,  the  operator’s  visible 
range  of  sight  is  physically  altered  via  the  grounding  and/or  positioning  of  a  map  or  camera 
view.  For  example,  Darken  and  Cervik  (1999)  manipulated  a  virtual  map  to  either  orient  “up”  as 
north  or  in  the  direction  of  forward  movement. 

FOV  describes  the  physical  dimensions  of  the  operator’s  visual  screen  view.  A  typical 
manipulation  contrasts  a  wide-panoramic  perspective  with  a  narrow  perspective.  Camera 
perspective  is  characterized  by  the  immersion  level  of  the  camera  in  reference  to  a  target  object. 
Manipulations  often  compare  a  third-person,  or  exocentric,  camera  perspective  with  a  first- 
person,  or  egocentric,  camera  perspective.  The  latter  would  be  a  fully  immersed  viewpoint.  For 
tasks,  such  as  in  a  UAV,  which  allow  for  three  axes  of  movement  (e.g.,  left-right/yaw,  forward- 
backward/roll,  up-down/pitch),  perspective  also  refers  to  whether  the  camera  view  is  gravity-  or 
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vehicle-based.  External  visual  manipulations  are  especially  sensitive  to  effects  of  cognitive 
processing  such  that  peripheral  vision  and  direct  vision  result  in  unique  visual  interpretations  and 
the  SA  of  an  environment. 

4.1.3  Visual  Processing  Resources 

HRI  studies  comparing  SS  to  MS  visual  cues  make  use  of  visual  processing  resources,  serving  as 
a  conceptual  bridge  between  time  and  contextual  resource.  Cuing,  in  this  case,  is  dependent  both 
on  nuanced  time  manipulations  (e.g.,  differential  processing  of  latency  for  SS  and  MS 
conditions)  and  on  contextual  infonnation  provided  by  the  presence  or  lack  of  normative 
binocular  depth  cues  in  SS  and  MS  views,  respectively. 

MS  visual  displays  consist  of  a  2-D  image  presented  to  both  eyes  that  provides  visual  cues  like 
object  size,  shadows,  and  the  interposition  of  objects  (Draper  et  ah,  1991).  SS  visual  displays 
present  a  3-D  image  representation  to  both  eyes  allowing  for  greater  perceived  realism  and, 
importantly  for  cognitive  processing,  retinal  disparity.  Retinal  disparity,  as  in  typical  viewing 
conditions,  allows  for  richer  visual  cues,  complex  depth  cues,  and  enhanced  visual  acuity.  Based 
on  Wickens’  (2002)  description  of  visual  channel  resources,  MS  displays  capitalize  on  peripheral 
vision  perceptual  resources,  whereas  SS  primarily  employs  focal  vision  perceptual  resources. 

4.2  Method 

We  conducted  a  literature  search  via  several  methods  to  create  a  comprehensive  HRI  database. 
First,  published  studies,  conference  proceedings,  and  technical  reports  were  obtained  via  a  search 
of  several  scientific  and  military  electronic  databases,  including  DTIC,  ACM,  and  IEEE. 
References  from  an  HRI  review  (Chen  et  ah,  2007)  as  well  as  obtained  studies  were  checked  for 
eligibility.  Finally,  a  hand  search  was  performed  on  the  following  journals  and  conference 
proceedings  for  the  past  5  years:  Human  Factors,  Presence,  Human  Computer  Interaction,  and 
Institute  of  Electrical  and  Electronics  Engineers. 

4.2.1  Definitions  and  Inclusion  Criteria 

4.2. 1 . 1  Independent  Variables.  To  be  included  in  the  present  review,  an  article  was  required  to 
report  a  study  that  experimentally  investigated  visual  modality  manipulations,  specifically 
system  latency,  FR,  FOV,  camera  perspective,  SS  vision,  or  MS  vision.  Studies  that  failed  to 
satisfy  these  dimensions  were  not  included  in  this  analysis. 

4.2. 1 .2  Criteria.  Within  HRI  literature,  user  performance  criteria  are  often  defined 
inconsistently,  hindering  between-study  comparisons  of  outcomes.  In  this  section,  criteria 
operations  most  frequently  measured  were  selected  for  analysis.  A  study  had  to  include  at  least 
one  of  the  following  criteria  to  be  eligible  for  inclusion:  ER,  RT,  efficiency,  SA,  or  the 
operator’s  perceived  workload.  ER  was  defined  as  the  number  or  percentage  of  incorrect 
responses  (or,  if  reverse  coded,  the  percentage  of  correct  responses)  or  as  a  measure  of  the  task’s 
deviation  from  an  optimal  path  or  solution.  Efficiency  was  coded  as  the  time  taken  to  complete  a 
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task,  or  as  a  measure  of  production  within  a  standardized  unit  of  time.  RT  represents  the  elapsed 
time  between  the  presentation  of  a  stimulus  to  the  fitting  response  by  an  operator.  For  the  sake 
of  distinction,  efficiency  assesses  the  overall  task  completion  time,  whereas  RT  measures  focus 
on  a  speeded  response  to  specific  stimuli  of  interest  (e.g.,  an  alarming  cue).  SA  describes  the 
level  of  task  and  contextual  knowledge  an  operator  has.  Finally,  perceived  workload  reflects 
self-report  measures  of  the  operator’s  experienced  cognitive  demands,  often  measured  by  the 
NASA-TLX. 

4.2. 1 .3  Study  Characteristics.  Characteristics  were  coded  that  may  affect  overall  study  results 
because  they  provide  insight  on  the  study  design’s  strength  and  fidelity  to  the  operators’  tasks 
and  environment.  For  study  design,  counterbalancing  or  random  assignment  were  noted  for 
repeated  measures  and  between-group  studies.  Study  fidelity  characteristics  included  the  sample 
population  (e.g.,  military,  student,  gender,  mean  age),  the  type  of  apparatus  used  (e.g.,  high-  or 
low-fidelity  simulator),  and  the  type  of  task(s)  being  performed  by  users/operators.  Apparatus 
included  actual  or  simulated  UAVs  and  UGVs,  flight  simulators,  helmet-mounted  displays 
(HMDs),  VEs,  or  simple  computer  interfaces/simulations.  Task  type  was  coded  according  to  the 
types  of  functions  asked  of  operators  by  the  experiment.  Task  category  examples  include  robot 
navigation,  teleoperated  manipulation  of  objects,  or  targeting  critical  objects/stimuli  on  the 
interface  (e.g.,  point  and  click). 

4.2.2  Procedure 

Prior  to  coding  studies  deemed  relevant  in  the  literature  search,  raters  reviewed  the  variables  of 
interest,  constructed  a  coding  sheet  to  reflect  them,  and  used  it  to  screen  for  article  eligibility.  Of 
the  studies  deemed  eligible,  five  were  selected  and  coded  by  all  five  raters  to  assess  inter-rater 
reliability.  Acceptable  rater  agreement  was  found.  Subsequent  studies  were  each  coded  by  one 
rater  on  the  following  six  dimensions:  (1)  article  characteristics,  (2)  sample  characteristics,  (3) 
research  design,  (4)  independent  variables,  (5)  task  type  and  apparatus  used,  and  (6)  the  type  of 
outcome  measured. 

Analyses  consisted  of  descriptive  summaries  of  the  obtained  articles  since  too  few  reported 
statistics  appropriate  for  a  meta-analytic  review.  These  summaries  include  infonnation  on  the 
workload  manipulation  and  study  design,  study  characteristics,  the  type  of  task  and  apparatus, 
relationships  with  dependent  variables,  and  a  summary  of  results,  or  guiding  principles.  Studies 
within  articles  were  coded  separately  if  independent  samples  were  used  in  each  (e.g.,  one  article 
with  three  reported  studies  using  independent  samples  was  coded  as  studies  a,  b,  and  c). 

4.3  Results  and  Discussion 

This  section  created  a  framework  around  which  to  organize  the  various  manipulations  of  visual 
modalities,  based  on  Wickens’  MRT  (1980,  2002).  In  the  present  analysis,  10  studies 
manipulated  FR  (table  5),  7  examined  latency/time  delay  (table  6),  7  compared  MS  to  SS  visual 
cues  (table  7),  10  assessed  FOV  (table  8),  and  1 1  studied  camera  perspective  (table  9).  A  few 
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studies  were  included  in  multiple  categories  (e.g.,  Lion  [1993]  was  included  in  both  FR  and  SS 
vs.  MS  cues)  since  they  examined  more  than  one  visual  modality  (Lion;  Reddy,  1997;  Scribner 
and  Gombash,  1998;  Van  Erp  and  Padmos,  2003;  Watson  et  ah,  2003).  Notably,  several  of  the 
studies  reported  under  the  contextual  resource  theme  share  considerable  overlap  in  their 
conceptualization  of  FOV  and  perspective.  This  analysis  conceptualized  SS  and  MS 
comparisons  as  a  hybrid  of  contextual  and  time  resources,  in  many  cases.  As  such,  some 
ambiguity  is  acknowledged  regarding  how  various  studies  were  organized. 
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Table  5.  Summary  of  studies  manipulating  FR. 


Study 

Manipulation 

Criteria 

Task  Type 

Results 

Calhoun  et  al., 
2006 

FR  (update  rates) 

Performance,  SA, 
usability,  workload 

Operator-controlled  UAV 

With  higher  update  rates,  SA  increased,  usability 
decreased,  workload  decreased,  and  performance 
increased.  Objective  performance  ratings  showed  no 
difference  between  FR  conditions. 

Darken  and 
Cervik,  1999 

FR 

Errors,  SA,  usability 

Navigation  of  building 
with  camera  view 

No  significant  differences  found  between  FR  video 
conditions;  no  significant  learning  effects. 

Lion,  1993 

FR  (3  3  or  22  Hz),  head 
motion,  MS  vs.  SS 

Performance,  errors 

Tracking  task  using  3-D 
computer  interface 

Higher  FR  related  to  better  performance;  performance 
learning  effects  present. 

Massimino  and 
Sheridan,  1994 

FR  (3,  5,  30  fps),  presence 
of  force  feedback 

Efficiency 

Operator-controlled 
mechanical  arm  via 
camera  view 

Increased  FR  significantly  improved  efficiency;  the 
addition  of  force  feedback  improved  efficiency  for  all  FR 
conditions. 

Reddy,  1997 

Study  A:  FR  (2.3,  11.5 
Hz), 

Study  B:  FR  (6.7,  14.2 
Hz),  FOV 

Errors,  efficiency 

Navigation  task  in  VE 

Errors  and  efficiency  decreased  with  lower  FR. 

Richard  et  al., 
1996 

FR,  SS,  and  MS  vision 

Efficiency 

Track  and  grasp  3-D 
moving  target 

Higher  FR  coupled  with  MS  compensated  for  a  lack  of 
SS  visual  cues;  learning  effects  were  significant. 

Van  Erp  and 
Padmos,  2003 

FR,  spatial  resolution  of 
image 

Errors,  efficiency,  SA 

Driving  task 

Higher  FR  was  related  with  improved  performance  for 
all  criteria;  no  significant  learning  effects. 

Watson  et  al., 
1998 

Studies  A,  B,  C:  FR  (9,  13, 
17  Hz) 

Efficiency,  errors,  RT, 
usability 

VE  track  and  grasp  of 
object  using  an  HMD 

With  lower  FR,  RT  increased,  usability  decreased,  and 
efficiency  was  reduced;  errors  were  not  significantly 
affected. 

Watson  et  al., 
2003 

Studies  A  and  B:  FR,  task 
difficulty 

Errors,  efficiency, 
usability 

Operator  used  an  HMD 
to  complete  tasks 

Efficiency  decreased,  and  errors  and  task  difficulty 
increased  as  FR  decreased. 

Chen  et  al.,  2008 

FR  (“normal,”  “reduced”) 

Errors,  efficiency, 
usability,  workload, 
motion  sickness, 
performance 

Simulated 

navigation/targeting 
using  UAVs  and  UGVs 

No  significant  differences  between  FR  conditions;  for 
UGVs,  performance  (hit  rates)  decreased  with  reduced 
FR;  no  performance  differences  for  UAV. 

Table  6.  Summary  of  studies  manipulating  latency/time  delay. 


Study 

Manipulation 

Criteria 

Task  Type 

Results 

Adelstein  et  al., 
2003 

Latency,  constant  or 
random  head  motion  rates 

RT 

Observed  VE  through 
HMD 

Only  interactions  were  significant — changes  in  motion 
patterns  resulted  in  a  decrease  in  operators’ 
discrimination  abilities  and  latency  detection. 

Ellis  et  al.,  2004 

Latency  detection, 
environmental  complexity 

Errors 

Navigation  of  VE  with 
an  HMD 

Complexity  of  environment  failed  to  effect  operator 
errors;  learning  effects  reported. 

Lane  et  al.,  2002 

Operator  input  and  robot 
action  time  delay 

Efficiency 

Tracking  and  grabbing  in 
UGV  simulator 

Increased  time  delays  led  to  a  decrease  in  efficiency. 

Luck  et  al.,  2006 

Studies  A  and  B:  latency 
rates,  variable  and  fixed 
latency  lengths 

Errors,  efficiency, 
usability 

Operator-navigated  VE 
using  UGV  simulator 

Increased  latency/time  delay  led  to  a  reduction  in 
efficiency  and  more  errors;  efficiency  improved  when 
time  delay  was  fixed  as  opposed  to  variable. 

Shreik-Nainar  et 
al.,  2003 

Constant  or  random  time 
delay 

Errors,  efficiency 

Navigation  of  VE  with 
an  HMD 

When  time  delay  was  constant,  as  opposed  to  variable, 
errors  increased  and  efficiency  decreased. 

Watson  et  al., 
2003 

Image  latency,  system 
responsiveness 

Errors,  efficiency 

Completed  with  HMD  in 
VE 

Significant  learning  effects  for  impact  of  system  latency. 

Chen  et  al.,  2008 

Latency  (250  ms  vs.  none) 

Errors,  efficiency, 
usability,  workload, 
motion  sickness, 
performance 

Simulated 

navigation/targeting 
using  UAVs  and  UGVs 

No  significant  differences  between  presence  or  lack  of 
latency;  usability  decreased  with  presence  of  latency. 

to 
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Table  7.  Summary  of  studies  manipulating  SS  and  MS  visual  cues. 


Study 

Manipulation 

Criteria 

Task  Type 

Results 

Drascic  and 
Grodski,  1993 

SS  vs.  MS 

Errors 

Object  manipulation 
using  teleoperated  robot 
arm 

SS  camera  view  significantly  reduced  ERs  compared  to 
MS  views. 

Draper  et  al.,  1991 

Studies  A,  B,  and  C:  SS 
vs.  MS 

Errors,  efficiency 

Placement  task  using 
robot  arm 

No  difference  was  present  between  MS  and  SS  for  low- 
difficulty  tasks;  significant  differences  were  present  for 
more  difficult  tasks  such  that  an  SS  view  resulted  in 
greater  efficiency. 

Lion,  1993 

SS  vs.  MS,  FR,  head 
motion 

Performance,  errors 

3-D  tracking  task 

SS  display  was  significantly  related  to  enhanced 
performance  and  a  reduction  in  errors. 

Park  and 
Woldstad,  2000 

Multiple  2-D  vs.  3-D  MS 
vs.  3-D  SS 

Errors,  efficiency, 
workload 

Placement  task  using 
robotic  arm 

No  significant  difference  between  3-D  MS  and  3-D  SS; 
2-D  display  outperformed  both  3-D  displays. 

Richard  et  al., 
1996 

Studies  A  and  B:  SS  vs. 
MS,  multimodal  feedback 
type 

Efficiency 

Used  haptic  feedback 
glove  in  VE 

When  FR  is  high  or  other  modality  cues  are  present,  SS 
does  not  present  a  significant  advantage  to  MS;  in 
baseline  conditions,  SS  is  more  efficient  than  MS; 
significant  learned  effects  were  present. 

Scribner  and 
Gombash,  1998 

SS  vs.  MS,  FOV 

Errors,  efficiency,  stress, 
usability 

UAV  driving  task 

SS  resulted  in  fewer  errors  and  reduced  stress  scores,  and 
was  preferred  by  users  (usability)  over  MS. 

Nielsen,  Goodrich, 
and  Ricks,  2007 

Studies  A  and  B:  2-D  vs. 
3-D  across  display  types 
(map,  video,  map-video) 

Errors,  efficiency 

UGV  navigation  (study 
A:  simulation;  study  B: 
realistic) 

Study  A:  no  significant  differences  in  completion  time 
or  errors;  learning  effects  present.  Study  B:  map-only 
display  completed  slower  than  map-video  (2-D)  and 
video-only  (3-D). 

Table  8.  Summary  of  studies  manipulating  FOV. 


Study 

Manipulation 

Criteria 

Task  Type 

Results 

Parasuraman  et  al., 
2003 

FOV  and  computer 
opponent  strategy 

Efficiency,  workload 

UGV  navigation  in  VE 

FOV  showed  no  effects  on  criteria. 

Parasuraman  et  al., 
2005 

FOV  (three  levels) 

Efficiency,  workload,  SA 

UGV  navigation  in  VE 

Workload  increased  as  FOV  decreased;  no  significant 
difference  was  present  for  efficiency. 

Pazuchanics,  2006 

Narrow  vs.  wide  FOV 

Efficiency,  errors, 
usability 

UGV  navigation 

Widening  FOV  resulted  in  improved  performance 
compared  to  narrower  FOV. 

Reddy,  1997 

Studies  A  and  B:  FOV,  FR 

Efficiency,  errors 

Navigation  task  in  VE 

Errors  and  efficiency  were  reduced  with  wider  FOV. 

Scribner  and 
Gombash,  1998 

FOV,  SS  vs.  MS 

Errors,  efficiency,  stress 
and  motion  sickness 

UAV  driving  task 

Motion  sickness  was  reported  more  frequently  in  wide 
FOV  condition;  no  significant  interaction  was  present 
between  FOV  and  MS/SS. 

Smyth  et  al.,  2001 

FOV  (three  levels) 

Errors,  efficiency, 
workload,  stress  and 
motion  sickness 

UGV  driving  task 

Wider  FOV  was  desired  for  navigation  but  the  FOV 
closest  to  typical  vision  was  preferred  for  steering. 

Smyth,  2002 

FOV  (three  levels) 

Errors,  efficiency, 
workload,  stress  and 
motion  sickness 

UGV  driving  task 

Indirect  FOV  resulted  in  decreased  driving  speed  and 
more  errors  compared  to  the  baseline  natural  vision 
condition. 

Van  Erp  and 
Padmos,  2003 

FOV  (two  levels) 

Errors,  efficiency,  SA 

Teleoperation  of  UGV 

Improved  performance  due  to  wider  FOV  was  task 
dependent  and  was  not  beneficial  when  the  narrower 
FOV  was  sufficient  for  the  task;  wide  FOV  was 
significantly  better  when  UGV  made  sharp  turns. 

Wang  and 
Milgram,  2003 

FOV  (six  levels) 

Errors,  SA 

Teleoperation  of  UGV 
using  various  camera 
views 

SA  increased  as  FOV  extended  outward  from  robot;  the 
moderate  FOV  condition  provided  the  best  local  SA  and 
ER. 

Draper  et  al.,  1991 

Narrow  vs.  wide  FOV 

Efficiency,  errors, 
usability 

UGV  simulated  search 
task 

Completion  times  (efficiency)  were  faster  with  a  wider 
FOV;  efficiency  is  incrementally  improved  when  wide 
FOV  and  warning  are  present. 

Table  9.  Summary  of  studies  manipulating  camera  perspective. 


Study 

Manipulation 

Criteria 

Task  Type 

Results 

Darken  and 
Cervik,  1999 

Map  direction  orientation 
(north-up  or  forward-up) 

Errors,  efficiency 

UGV  driving  task  using 
camera/map  view 

Forward-up  alignment  was  best  for  targeted  search  tasks, 
but  north-up  alignment  elicited  the  best  performance  for 
naive  and  primed  search  tasks. 

Heath-Pastore, 

1994 

Gravity  based  vs.  vehicle 
based 

Errors 

UGV  simulator  driving 
task 

Operators  reported  greater  confidence  and  SA  for 
gravity-referenced  view;  gravity-based  perspective  had 
fewer  errors  than  vehicle-based  perspective. 

Hughes,  2005 

No.  of  cameras  fixed  or 
independent  of  UGV, 
camera  alignment 

Errors,  usability 

Search  and  navigation  of 
robots;  target  ID 

Operator-controlled  cameras  best  for  usability. 

Lewis  et  al.,  2003 

Gravity  based  vs.  vehicle 
based 

Errors,  efficiency, 
usability 

Teleoperation  of  UGV 

Efficiency  and  usability  were  significantly  better  for 
gravity-fixed  display. 

Murray,  1995 

Fixed  vs.  mobile  vehicle- 
based  view 

Efficiency 

Target  detection  using 
camera  views 

Efficiency  was  reduced  with  mobile  camera  views  vs. 
fixed-position  cameras. 

Olmos  et  al.,  2000 

2-D  display  vs.  3-D  third- 
person  display  vs.  3-D  split 
screen  display 

Error,  efficiency,  RT 

Navigation  of  VR  terrain 
in-flight  simulator 

2-D  display  was  detrimental  to  vertical  maneuver 
performance,  3-D  display  showed  greatest  deficits  during 
lateral  maneuvers;  split  screen,  when  displays  were  made 
visually  consistent,  was  significantly  different  from  2-D 
and  3-D  display. 

Schipani,  2003 

Line-of-sight  view  vs.  non- 
line-of-sight 

Workload 

Navigation  of  UAV 

No  significant  difference  for  workload  was  found 
between  line  of  sight  and  non  line  of  sight. 

Thomas  and 
Wickens,  2000 

Third-person  vs.  first- 
person  view 

Errors,  RT,  usability 

Simulated  teleoperation 
of  robot 

Third-person  view  demonstrated  faster  RT  and  fewer 
errors,  and  operators  reported  higher  levels  of  confidence 
(usability)  compared  to  the  first-person  view. 

Draper  et  al.,  1991 

Camera  view  vs.  camera 
view  inlaid  in  larger  virtual 
display 

Efficiency,  errors, 
usability 

UGV  simulated  search 
task 

Reported  usability  reduced  when  camera  perspective  is 
inlaid  (picture-in-picture  display)  VE  display. 

Nielsen,  Goodrich, 
and  Ricks,  2007 

Studies  A  and  B:  display 
type  and  2-D/3-D 
comparison:  video-only  vs. 
map-only  vs.  video-map 
display 

Errors,  efficiency 

UGV  navigation  (study 
A:  simulation;  study  B: 
realistic) 

Study  A:  video-only  display  resulted  in  most  errors  and 
slowest  completion  time;  no  differences  between  map- 
only  and  map  +  video,  learning  effects  present.  Study  B: 
for  2-D  displays,  more  errors  present  for  video-only  vs. 
map-only  display. 

Drury  et  al.,  2007 

Map-centric  vs.  video¬ 
centric  display 

Errors,  efficiency,  SA, 
usability 

Simulated  UGV  search 
and  navigation  task 

Video-centric  display  best  for  usability,  movement 
efficacy,  SA,  surroundings  awareness;  map-centric  best 
for  location  and  status  awareness. 

4.3.1  Time  Resources 

System  latency  and  FR  were  categorized  as  time  resource  HRI  system  features.  Fourteen  studies 
reported  in  10  articles  address  HRI  FR  manipulations.  Of  these  studies,  1 1  measured  efficiency 
and  errors,  7  usability,  and  3  SA.  Two  examined  workload,  and  three  measured  task 
performance.  Overall  findings  suggest  that  higher  FR  (e.g.,  more  frames  per  second)  increases 
efficiency,  reduces  errors,  and  improves  usability,  among  other  criteria. 

System  latency/time  delay  was  manipulated  in  eight  studies  within  seven  articles.  Six  of  these 
studies  measured  errors  and  efficiency.  Usability  was  assessed  by  two  studies,  and  RT  by  one. 
Findings  suggest  that  increased  time  delays  between  an  operating  system  and  its  operator  result 
in  decreased  efficiency  and  increased  ER.  All  but  one  of  the  studies  examining  fixed  latency  vs. 
variable  delays  reported  that  fixed  latency  delays  ameliorate  operator  efficiency  and  ER. 

The  HRI  literature  on  FR  and  latency  frequently  made  use  of  the  terms  interchangeably  or 
inconsistently.  Conceptual  ambiguities  were  resolved  based  upon  a  study’s  task  and  criteria. 
Generally,  higher  FR  and  decreased  latencies  benefitted  user  performance.  These  results  are 
consistent  with  the  notion  that  a  more  realistic  image  will  result  in  less  discrepancy  between 
typical  visual  processing  and  visual  processing  of  technologically  altered  stimuli.  Frequently,  a 
consistent  FR  was  used  throughout  studies.  Though  methodologically  consistent,  this  approach 
lacks  external  validity  because  FR  does  vary  within  and  across  HRI  tasks  (e.g.,  Darken  et  al., 
2003).  Thus,  experimental  studies  of  FR  often  require  less  of  the  operator’s  attention  since 
conditions  are  predictable.  Operator  awareness  was  also  of  issue  for  latency  studies.  Several 
studies  reported  that  either  learning  yielded  significant  increases  in  performance  criteria  and/or 
pretask  awareness  training  mitigated  the  deleterious  effects  of  latency  on  perfonnance  measures 
(Ellis  et  al.,  2004;  Watson  et  al.,  2003).  Future  research  should  seek  to  create  common 
operations  of  FR  and  latency,  assess  the  specific  effects  of  learning,  and  determine  the  threshold 
for  cognitive  processing  of  a  realistic/real  environment  in  contrast  to  a  VE. 

4.3.2  Contextual  Resources 

Contextual  resources  include  FOV  and  camera  perspective.  FOV  was  examined  in  1 1  studies 
within  10  articles;  10  measured  efficiency,  9  looked  at  errors,  4  examined  workload,  3  addressed 
SA,  and  2  accounted  for  self-reported  stress,  motion-sickness,  and  usability.  The  results  on  FOV 
are  mixed  but  do  suggest  a  preference  for  a  wide  to  moderate  FOV  over  a  narrow  FOV.*  When  a 
wider  FOV  is  introduced,  Scribner  and  Gombash  (1998)  reported  increased  motion  sickness 
rates. 

Twelve  studies  addressed  camera  perspective,  nine  reported  measures  of  error,  seven  assessed 
efficiency,  five  usability,  two  RT,  and  one  workload.  Results  should  be  taken  as  starting  points 
rather  than  principle,  given  the  variety  of  manipulations.  Overall,  performance  is  maximized 


The  reader  is  referred  to  the  Scribner  and  Gombash  (1998)  manuscript  for  complete  definitions. 
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when  the  camera  perspective  is  an  exocentric,  third-person  view  of  the  environment  and/or 
gravity-referenced  (as  opposed  to  being  referenced  toward  the  camera’s  physical  direction  of 
movement  or  tilt).  Additionally,  when  a  split-screen  display  is  present  (e.g.,  either  a  third-person 
perspective  or  3-D  image  is  viewed  alongside  a  first-person  or  2-D  image,  respectively), 
performance  is  maximized  and  is  incrementally  better  to  single  perspective  conditions  (Olmos  et 
ah,  2000). 

Despite  a  wide  range  of  methodologies  and  manipulations,  contextual  resource  study  results  all 
promote  moderation  (i.e.,  FOV  within  typical  visual  range)  and  integration  (i.e.,  perspective  and 
FOV  presenting  multiple  visual  displays).  For  example,  when  combined  with  another  workload 
reduction  task  (such  as  increasing  contextual  information),  an  FOV  manipulation  allowing  an 
operator  to  switch  between  manual  and  automated  operating  systems  positively  affected 
performance  (Pazuchanics,  2006).  This  suggests  that  integrating  contextual  resources  with  other 
interface  features  can  be  a  force  multiplier.  Relatedly,  perspectives  that  provide  either  a  third- 
person  view  or  a  stable,  gravity-based  orientation  facilitate  performance  (e.g.,  Thomas  and 
Wickens,  2000).  Results  underscored  the  utility  of  a  user’s  natural  spatial  ability,  in  addition  to 
learning  effects,  when  it  comes  to  increasing  perfonnance  on  criteria  (e.g.,  Darken  and  Cervik, 
1999). 

4.3.3  Results  for  Visual  Processing  Resources 

SS  and  MS  visual  cues  were  examined  by  1 1  studies  within  7  articles;  8  reported  errors  and 
efficiency  while  the  other  criteria — workload,  usability,  general  perfonnance,  and  self-reported 
stress — were  each  assessed  within  a  study.  A  pattern  emerged  suggesting  SS  views  are  to  be 
preferred  over  MS  views  with  regard  to  efficiency  and  enors.  Richard  and  colleagues  (1996) 
found  that  when  other  modalities  act  as  additive  cues  for  the  operator  or  visual  conditions  are 
optimal  (e.g.,  high  FR),  SS  is  no  better  than  MS. 

The  benefits  of  SS  displays  over  MS  displays  were  not  overwhelming,  as  many  researchers  had 
hypothesized.  In  baseline  conditions,  the  added  realism  and  depth  cues  provided  by  SS  displays 
did  benefit  operator  performance.  However,  in  the  presence  of  other  cues,  such  as  auditory 
alerts,  MS  displays  fared  as  well  or  slightly  worse  than  SS  displays.  Notably,  the  small  number 
of  studies  included  in  this  category  and  the  specificity  of  task  manipulations  may  bias  these 
preliminary  findings.  The  age  of  these  studies  is  also  of  interest.  All  but  two  studies  were 
published  in  the  1990s.  Beyond  this,  these  studies  lacked  consistency  among  their  task  purposes 
and  operator  instructions.  Several  studies  stress  speed,  for  instance,  over  accuracy,  and  vice 
versa.  Though  overall  results  were  inconsistent  regarding  the  advantages  of  SS  displays  over 
MS  displays,  there  is  a  consistent  trend  favoring  SS  over  MS  in  high-difficulty  situations 
requiring  greater  visual  acuity.  Thus,  the  advantages  of  each  are  highly  contingent  on  the  task 
difficulty  and  the  presence  of  multimodal  cueing. 
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4.4  Conclusions 


This  section’s  review  suggests  that  realistic  camera  images  (e.g.,  moderate  FOV,  high  FR,  low 
latency  delay)  are  related  to  higher  perfonnance  ratings  for  operators.  Increased  image  realism 
allows  operators  to  compensate  for  deficits  in  visual  processing.  Such  deficits  may  be  the  result 
of  reduced  retinal  disparity  and  depth  cues  or  overtaxing  of  focal  and  peripheral  visual 
processing  resources.  Our  preliminary  analysis  supports  the  notion  that  using  robot-enhanced 
visual  systems  aids  overall  task  performance  but  also  suggests  that  our  understanding  of  these 
relationships  is  incomplete. 

Limitations  of  this  qualitative  analysis  primarily  center  on  identifying  the  inclusion  criteria,  the 
potential  for  coding  errors  within  each  study,  faulty  translation  of  ambiguous  terminology,  and 
the  small  number  of  studies  included  within  the  five  categories.  A  shortcoming  of  the  current 
visual  modality  literature,  and  this  analysis  by  extension,  is  the  absence  of  a  shared  mental  model 
within  the  literature.  For  example,  “third-person  camera  perspective”  was  used  interchangeably 
with  an  “exocentric  perspective”  or  a  “god’s-eye  view.”  Though  study  details  suggest  these 
tenns  are  equivalent,  task  demands  frequently  were  given  priority  over  fidelity  to  the 
manipulation  itself. 

Another  concern  was  the  impact  of  learning  effects  beyond  the  study  variables.  Most  studies 
took  two  approaches  to  learning  effects,  either  (1)  participants  completed  practice  trials  prior  to  a 
study’s  data  collection  to  minimize  effects  or  (2)  the  study  included  a  measure  of  learning  effects 
as  part  of  the  experiment.  Beyond  this,  the  majority  of  coded  studies  shared  significant 
methodological  constraints  due  to  their  samples,  which  were  notably  small  and  predominantly 
male. 

As  exemplified  by  the  results  of  this  qualitative  analysis,  researchers  need  to  agree  on  a  program 
of  research,  use  a  common  framework  to  exact  a  shared  mental  model,  and  address  numerous 
literature  gaps  at  both  theoretical  and  practical  levels.  The  need  to  create  and  define  a 
programmatic  research  agenda  poses  an  imminent  challenge  to  researchers.  Utilizing  the 
framework  proposed  in  this  section  will  allow  for  a  systematic  and  theoretically  founded  means 
for  future  studies  on  visual  modalities.  In  particular,  latency/time  delay  and  camera  perspective 
warrant  greater  attention  in  order  to  create  a  more  unified,  less  fragmented  research  agenda. 
Perhaps  more  confused  than  the  manipulations  are  the  criteria.  Aside  from  reporting  ER,  no 
consensus  exists  for  measuring  other  criteria.  For  example,  within  the  area  of  FOV  research, 
studying  motion  sickness  and  SA  appears  to  be  a  fruitful  research  avenue,  but  these  criteria  are 
neglected  in  other  categories  of  visual  modality  manipulations.  Thus,  both  criteria  and  predictors 
deserve  greater  attention  and  consistency.  Without  the  latter,  few  guiding  principles  will  arise, 
and  the  utility  of  visual  modalities  will  remain  ambiguous  in  military  and  civilian  operations. 
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5.  Recommendations  for  Further  Research 


We  reviewed  work  in  the  broad  area  of  HRI  and  it  is  available  as  a  RefWorks  (2009)  dataset. 

This  report  is  more  focused,  presenting  three  issues  specific  to  the  development  of  principles  for 
multimodal  displays  in  HRI  operations:  workload,  autonomy  and  automation,  and  visual  display 
issues.  Each  topic  has  developed  strengths  and  provided  guidance  to  developers.  However, 
further  work  remains  to  be  done.  As  such,  the  following  sections  present  issues  targeted  at 
further  research  in  the  area. 

5.1  Meta- Analysis  of  Coded  HRI  Articles 

We  conducted  a  literature  review  to  identify  articles  dealing  with  HRI  topics.  Hundreds  of 
articles  were  screened  for  those  that  contained  data  suitable  for  meta-analysis.  These  articles 
were  further  coded  using  the  coding  scheme  developed  for  the  project.  The  coded  articles  are 
contained  in  an  online  RefWorks  database.  The  articles  were  further  organized  according  to  the 
taxonomy  of  independent  variables  listed  in  section  4  of  this  report. 

The  next  logical  step  is  to  further  code  the  articles  for  a  meta-analysis.  This  involves  taking  the 
data  from  the  articles  and  computing  common  statistics  for  the  meta-analysis.  In  certain 
instances,  authors  will  need  to  be  contacted  to  provide  the  requisite  data.  In  addition  to  the 
statistical  computations,  it  will  be  necessary  to  fonn  appropriate  theoretical  or  methodological 
groups  of  equivalent  metrics  and  variables.  The  initial  grouping  has  been  completed  and  is 
presented  in  section  4  of  this  report.  For  instances  where  there  are  not  enough  studies  for  a  meta¬ 
analysis,  further  grouping  will  be  undertaken.  This  involves  organizing  and  collapsing  studies 
into  groups  that  are  theoretically  similar.  For  example,  when  considering  the  studies  of  workload 
and  controlling  one  or  more  robots,  does  it  make  sense  to  aggregate  studies  of  more  than  one 
robot  into  an  ordinal  scale?  Similar  questions  exist  about  the  role  of  potential  moderators  and 
their  relation  to  outcomes. 

5.2  Cumulative  Sum  Methodology  for  Modeling  Training  Effectiveness/Skill  Decay 

Section  3  of  this  report  contains  an  article  describing  the  cumulative  sum  technique.  Used  in  a 
variety  of  areas — most  recently  for  medical  error  issues — it  holds  potential  as  a  modeling 
technique  when  the  purpose  is  monitoring  the  process  of  either  acquiring  skills  and/or  skill 
decay.  It  is  flexible  and  tailorable,  and  may  prove  more  effective  providing  feedback  to 
warfighters  training  to  criterion  perfonnance  levels  (e.g.,  TAFON  robot  operations).  We 
propose  developing  the  technique  as  a  training  feedback  system  for  use  both  in  the  United  States 
and  with  deployed  forces. 
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5.3  Focus  Groups/Structured  Interviews  for  Lessons  Learned 

Figure  2  illustrates  some  of  the  main  themes  that  emerge  from  a  review  of  the  academic  and 
scientific  HRI  literatures.  What  appears  to  be  lacking,  however,  is  a  user  perspective  on 
important  aspects  of  HRI  for  the  warfighter.  It  would  be  useful  to  gather  experienced  users  of 
one  or  more  robot  types  (e.g.,  FCS  technologies  network,  TALON,  iRobot,  PackBot)  together  for 
focus  groups  and/or  structured  interviews  to  extract  lessons  learned  on  in- theater  robot 
operations.  Issues  to  be  addressed  include  those  listed  in  figure  1  as  well  as  others  that  emerge 
from  the  sessions.  This  information  would  be  culled  together  into  a  report  that  could  be  used  to 
guide  further  HRI  research,  systems  design,  and  training  that  would  be  directed  specifically  to 
the  needs  of  the  warfighter. 


Figure  2.  Model  depicting  active  areas  of  HRI  research. 
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5.4  Structural  Equation  Modeling  of  HRI  Effectiveness 

Structural  equation  modeling  can  yield  systematic  and  reliable  insight  from  accumulated  data. 
For  example,  given  a  collection  of  focus  group  research,  information  can  be  collected  on  the 
experience  and  effectiveness  of  the  operators  who  serve  as  participants.  Experience  could  then 
be  operationalized  via  such  measures  as  the  amount  of  time  operating  specific  robot  types, 
number  of  field  exercises,  number  of  missions,  and  so  forth.  Effectiveness  is  operationalized  via 
objective  data,  when  available,  or  by  self-reports  based  on  past  missions.  This  data  along  with 
existing  attitudinal,  aptitude,  and  training  data  can  be  used  to  develop  conceptual/theoretical 
models  of  warfighter  HRI  perfonnance.  These  models  are  tested  via  structural  equation 
modeling  to  determine  the  importance  of  different  constructs  on  warfighter  HRI  performance. 

5.5  Multilevel  Data  Structures  and  Models 

Research  in  HRI  is  needed  in  many  areas,  such  as  workload  modeling,  team  vs.  individual 
operators  for  different  numbers  of  robots,  and  the  effectiveness  of  multisensory  interfaces  given 
differing  task  requirements.  Many  research  issues  studied  thus  far  and  reported  in  the  literature 
have  not  employed  a  multilevel  analysis  even  though  the  problems  and  data  are  hierarchically 
structured. 

Many  kinds  of  data  have  a  hierarchical  or  clustered  structure  such  that  units  at  one  level  are 
nested  within  units  at  the  next  level.  Examples  include  offspring  grouped  within  families, 
students  in  classes,  and  individual  workers  in  teams.  For  repeated  measures  data,  the 
observations  or  measurements  are  nested  within  individuals.  The  lowest  level  of  observation  is 
called  level  1  (e.g.,  offspring,  students),  followed  by  level  2  (e.g.,  families,  classes),  and  so  forth. 
Theoretically,  there  may  be  any  number  of  levels  to  such  a  structure,  but  in  practice,  most 
empirical  studies  focus  on  2-  or  3 -level  data.  Multilevel  models  are  also  known  as  hierarchical 
linear  models,  mixed  models,  and  random  coefficients  models.  For  an  introduction  and/or 
complete  treatment  of  the  methodology,  see  Bryk  and  Raudenbush  (2001),  Goldstein  (1995), 

Hox  (1995),  Kreft  and  de  Leeuw  (2000),  and  Snijders  and  Bosker  (1999). 

In  multilevel  model  frameworks,  variables  are  often  measured  at  each  level.  The  subscript  i  is 
used  to  represent  unit  i  in  level  1,  and  subscript  j  is  used  to  represent  unit  j  at  level  2.  An 
example  is  individual  i  in  group  j.  The  outcome  (response)  variable  measured  on  each  level- 1 
unit  is  designated  v,j  and  represents  the  measure  for  individual  i  in  group  j.  Predictor  variables  at 
level  1  are  designated  xy,  and  at  level  2  are  designated  zj. 

Analysis  of  multilevel  data  in  the  past  has  been  performed  in  different  ways  but  each  has 
associated  problems.  These  problems  are  as  follows: 

1 .  Disaggregating  (total  or  pooled  regression  analysis):  Regression  is  conducted  on  the  full 
data  analysis,  and  the  unit  of  analysis  is  the  individual.  The  general  model  is  yy  =  a  +  bxy  + 
ey ,  where  “a”  refers  to  the  intercept,  b  is  the  beta  weight,  and  ey  is  the  error.  The  model 
can  expand  to  accommodate  level-2  variable  predictors:  yy  =  a  +  bxy  +  czj  +ey ,  where  c  is 
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the  beta  weight  for  the  level-2  variable  zj.  Problems  with  this  approach  include  severely 
biased  estimates  of  effects  (regression  coefficients  tending  to  be  too  large  and  standard 
errors  too  small)  resulting  in  inflated  type-1  ERs.  Goldstein  (1995)  provides  a  full 
discussion  of  this  and  additional  problems. 

2.  Aggregation:  This  approach  is  used  to  analyze  the  data  at  the  level-2  unit,  or  group  level  in 
our  example.  The  model  for  the  data  is  v  =  a  +  bx  ,j  +  e} ,  and  with  a  level-2  predictor,  y  ./ 
=  a  +  bx  .j  +  c Zj  +  ey  The  primary  limitation  of  the  aggregation  approach  is  that  it  ignores 
all  level-2  variation  (within  group  in  our  example).  The  result  is  that  no  interpretations  can 
be  made  about  effects  or  relationships  at  the  individual  level.  Attempted  interpretations 
lead  to  the  “ecological  fallacy”  (Robinson,  1950). 

3.  Analysis  of  covariance  (ANCOVA):  The  ANCOVA  model  has  the  individual  as  the  unit 
of  analysis,  and  the  independent  variable  is  categorical  with  levels  defined  by  the  level-2 
units  (groups  in  our  example).  The  level- 1  predictor  variable,  x,  is  a  covariate.  The  intent 
of  the  analysis  is  to  test  for  an  effect  of  level-2  units  (groups)  on  y,  after  removing  the 
effect  of  x.  The  model  is  v;/  =  aj  +  bxij  +  ey ,  and  the  coefficient  b  is  assumed  to  be  invariant 
across  level-2  units,  while  the  intercept  aj  is  allowed  to  vary  across  those  level-2  units. 
Problems  with  this  strategy  include  the  unrealistic  assumption  (in  most  cases)  of  an 
invariant  slope  and  the  inability  to  incorporate  level-2  variables  z  into  ANCOVA. 

4.  Separate  regressions:  This  approach  is  to  conduct  separate  regression  analyses  within  each 
level-2  unit,  with  the  model  stated  as  v;/  =  a,  +  bjXy  +  ey .  The  procedure  yields  separate 
estimates  for  intercepts  and  slopes  for  each  level-2  unit,  which  can  be  used  to  compute 
variability  in  the  resulting  estimates  of  a,-  and  bj.  These  estimates  can  be  used  as  dependent 
variables  in  second-state  regressions,  predicting  these  values  from  the  level-2  variable,  z: 

aj  =  Co  +  coZj  +  eaj  and  bj  =  d\  +  d\Zj  +  c/v.  Results  indicate  the  degree  to  which  variability  in 
intercepts  and  slopes  among  level-2  units  are  predictable  from  z.  While  it  is  closer  to 
taking  into  account  the  multilevel  nature  of  the  data,  it  is  problematic  in  several  ways, 
including  high  unreliability  in  level-2  slopes  and  intercepts,  and  no  partitioning  of  the 
variance  of  y  into  within-  vs.  between-group  portions.  Also,  it  is  impractical  with  a  large 
number  of  level-2  units  and  involves  the  estimation  of  a  large  number  of  parameters 
(Goldstein,  1995;  Kreft  and  de  Leeuw,  2000). 

5.5.1  The  Linear  Multilevel  Model 

A  basic  assumption  of  the  multilevel  model  is  that  there  exists  a  population  of  units  at  each  level, 
and  we  obtain  a  random  sample  from  each  population.  For  example,  if  our  level-2  variable  is 
schools,  we  assume  a  population  of  schools  and  we  obtain  a  random  sample  of  schools.  A 
level- 1  variable  is  a  population  of  students  within  each  school,  and  we  obtain  a  sample  of 
students  from  within  each  school.  The  outcome  variable  Vy  is  the  score  for  outcome  variable  y 
for  level  1,  unit  i,  from  level  2,  unit  j.  An  example  is  a  reading  test  score  for  student  i  from 
school  j.  For  level- 1  predictors,  xy  is  the  score  on  predictor  x  for  level  1,  unit  i,  from  level  2,  unit  j. 
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For  example,  xy  might  represent  the  number  of  hours  spent  reading  each  week  for  student  i  from 
school  j.  For  level-2  predictors,  Zj  is  the  score  on  predictor  z  for  level  2,  unit  j,  e.g.,  a  binary 
indicator  of  public  (1)  vs.  private  (0)  for  school  j. 

The  level- 1  model  specifies  the  outcome  variable  as  a  linear  function  of  level- 1  predictors.  In 
the  previous  example,  the  reading  test  score  is  expressed  as  a  linear  function  of  the  number  of 
hours  spent  reading  each  week:  vy  =  a,  +  bjXij+  ey,  where  ay  is  the  intercept  for  level  2,  unit  j,  and 
it  is  the  predicted  value  of  yy  when  xy=  0.  bj  is  the  slope  for  level  2,  unit  j,  and  represents  the 
predicted  change  in  Vy  for  a  1-unit  increase  in  xy.  In  our  example,  it  is  the  predicted  change  in 
reading  test  scores  for  student  i  in  school  j  when  the  number  of  hours  spent  reading  increases  by 
one  unit.  The  slopes  and  intercepts  vary  across  level-2  units.  The  residual,  ey,  is  that  portion  of 
Vy  not  accounted  for  by  xy.  Level  1  has  residual  variance,  Va  (ey)  =  a,  and  represents  the 
variance  in  the  outcome  variable  not  accounted  for  by  level- 1  predictors.  In  our  example,  this  is 
the  variance  in  reading  scores  not  accounted  for  by  the  number  of  hours  spent  reading. 


The  level-2  model  specifies  level- 1  random  coefficients  as  a  linear  function  of  level-2 
predictor(s).  For  example,  level- 1  slopes  and  intercepts  in  equations  for  predicting  reading  test 
scores  from  the  number  of  hours  spent  reading  are  represented  as  linear  functions  of  public-  vs. 
private-  school  indicators.  The  level-2  model  for  level- 1  intercepts  is  q,  =  yoo  +  y oiZj  +  uoj,  where 
yoo  and  yoi  are  intercepts  and  slopes  in  the  model,  yoo  is  the  predicted  value  of  a,  when  zj  =  0. 
yoi  is  the  predicted  change  in  ay  for  a  one-unit  increase  in  zy.  These  are  fixed  coefficients.  In  our 
example  are  the  intercept  and  slope  for  model-predicting  level- 1  intercepts  in  the  reading  test — 
the  number  of  hours  reading  relationship  from  public-vs. -private  schools.  The  residual  term  in 
the  model  is  expressed  by  uoy  and  represents  that  portion  of  aj  not  accounted  for  by  zy.  The 
variance  of  the  residual  is  Var(u0j)  =  Too-  The  level-2  model  for  level- 1  slopes  is  expressed  as  by 
=  yio  +  y  i  \Zj+  uiy,  where  yio  and  yn  are  the  intercept  and  slope  in  the  model.  The  term  yi0  is  the 
predicted  value  for  bj  when  zy  =  0.  The  predicted  change  in  bj  for  a  one -unit  increase  in  zy  is 
expressed  as  yn.  These  are  fixed  coefficients,  and  in  our  example  are  the  intercept  and  slope  for 
predicting  slopes  in  the  level- 1  reading  test — the  relationship  between  the  number  of  hours 
reading  for  public  vs.  private  schools.  The  residual  term  is  ujj  and  represents  the  portion  of  bj  not 
accounted  for  by  zy.  The  variance  of  the  residuals  is  expressed  as  Var(uiy)  =  xn.  The  residuals  in 
the  two  equations  have  a  covariance,  COV (uoy,  uij)  =  Ton 

Through  substitution  from  the  level-2  equation  for  ay  and  bj  into  the  level- 1  model  for  yy,  the 
overall  model  for  yy  is  given  as 


yy  =  ay  +  byxy  +  ey. 


Vij  =  (yoo  +  yoi  Zj  +  Uoy)  +  (yio  +yil%  +  Uij)  Xy  +  ey  , 


(1) 

(2) 


and 


Vy  =  yoo  +  yioXy  +  YoiZy+YnZyXy  +  uyXy  +u0y  +  ey . 


(3) 
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It  is  important  to  note  two  aspects  that  set  this  equation  3  apart  from  a  conventional  regression 
model.  The  first  is  the  presence  of  an  interaction  between  a  level- land  a  level-2  predictor.  This 
important  characteristic  allows  for  the  modeling  of  cross-level  interactions.  Second  are  the 
distinct  error  terms  for  different  levels  and  aspects  of  the  model.  This  takes  into  proper  account 
the  multilevel  data  structure  in  defining  these  errors. 

This  basic  model  is  easily  extended  to  include  multiple  level- 1  and  level-2  predictors  through  the 
addition  of  another  subscript  on  the  jc’s  and  z’s  as  well  as  their  coefficients. 

1 .  Parameter  estimation:  The  previous  model  can  be  fit  to  a  hierarchically  structured  sample 
of  observations.  The  model  implies  a  moment  structure  (means,  variances,  covariances)  of 
measured  variables  that  are  expressed  as  a  function  of  a  set  of  parameters:  level- 1  residual 
variance  (a2),  level-2  fixed  coefficients  (yoo,  Yoi,  Yio,  Yu),  and  level-2  residual  variances  and 
covariances  (too,  th,  toi). 

2.  Model  specification:  Models  will  be  specified  in  order  to  address  our  research  questions. 
Within  each  question,  a  series  of  models  ranging  from  simple  to  complex  will  be  specified, 
first  introducing  level- 1  predictors  and  then  introducing  level-2  predictors.  This  general 
strategy  is  recommended  by  Snijders  and  Bosker  (1999)  and  Bryk  and  Raudenbush  (1992). 
Comparison  of  residual  variances  between  models  indicates  the  variance  accounted  for  by 
each  set  of  predictors. 


5.5.2  HRI  Research 


Consider  the  following  research  question:  To  what  extent  is  the  warfighter’s  perfonnance  with  a 
robot  influenced  by  the  (1)  warfighter’s  personal  attributes,  (2)  robot’s  attributes,  and  (3)  team 
performance  factors. 


This  can  be  expressed  in  a  general  equation  fonn: 

Warfighter  and  Warfighter  personal 

robot  performance  =  attributes  + 

Outcome  Level- 1 

variable  variables 


Robot  Team  level 

attributes  +  factors 

Level- 1  Level-2 

variables  variables 


(4) 


The  first  set  of  analyses  focuses  on  predicting  the  warfighter’s  and  robot’s  performance  from  two 
sets  of  level- 1  predictors  (warfighter’s  personal  attributes  [e.g.,  experience],  robot’s  attributes 
[e.g.,  interface  type],  and  one  level-2  predictor  [team  level  factor]).  A  series  of  models  ranging 
from  simple  to  complex  will  be  examined,  and  a  comparison  of  the  residual  variances  between 
models  will  indicate  the  amount  of  variance  accounted  for  by  each  set  of  predictors  (Snijders  and 
Bosker,  1999;  Bryk  and  Raudenbush,  1992).  Beginning  with  the  warfighter’s  personal  attributes 
variables,  variables  will  be  entered  as  a  set,  and  those  significantly  related  to  the  warfighter  and 
robot  performance  would  be  retained.  Following  this,  the  robot  attributes  variables  will  be 
entered  as  a  set,  and  those  significantly  related  to  warfighter  and  robot  performance  would  be 
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retained.  Each  model  would  have  a  deviance  statistic  reflecting  the  residual  variance.  The 
deviance  statistic  allows  for  a  comparison  of  the  remaining  variance  from  each  model  and 
facilitates  model  comparisons.  The  team  level  variables  result  in  a  continuum  of  perfonnance. 
This  allows  for  an  examination  of  cross-level  interactions  between  the  team  level  factors  (level 
2)  and  the  warfighter  attributes  (level  1)  as  well  as  the  robot  attributes  (level  1).  These  models 
will  be  statistically  compared  via  the  deviance  statistic. 

5.5.3  Multilevel  Models  of  Repeated  Measures 

Another  series  of  analyses  focuses  on  the  dynamic  relationship  that  exists  in  HRI  operations.  In 
this  repeated  measured  data,  the  sample  of  individuals  is  measured  on  repeated  occasions.  These 
data  can  be  viewed  as  having  a  multilevel  structure,  where  level- 1  units  are  trials  and  level-2 
units  are  missions.  Trails  are  nested  within  procedures,  and  procedures  are  nested  within 
individuals  (level  3).  Within  the  multilevel  framework  there  is  no  need  for  the  measurements  to 
be  the  same  for  each  individual.  Also,  the  spacing  and  number  of  occasions  can  vary  across 
individuals. 

The  basic  two-level  model  is  easily  extended  to  accommodate  level-3  units  (Bryk  and 
Raudenbush,  2001;  Goldstein,  1995;  Kreft  and  de  Leeuw,  2000;  Snijders  and  Bosker,  1999). 

Level-3  predictor  measures  are  all  variables  that,  up  to  this  point,  have  been  considered  level- 1 
predictors  (e.g.,  warfighter’s  personal  attributes,  robot  attributes,  team  factors).  Following  the 
strategy  just  described,  the  level- 1  predictors  will  be  examined  one  at  a  time,  followed  by  the 
level-2  predictor  procedure,  then  the  level-3  predictors.  Models  from  least  to  most  complex  will 
be  evaluated  and  compared  with  the  deviance  statistic 

5.6  HRI  Operators  Field  of  Vision 

Manipulating  the  type  of  information  that  is  provided  by  visual  modalities  and  available  to 
operators  of  TALON  robots  in  military  operations  will  provide  increased  knowledge  on 
mechanisms  to  maximize  perfonnance  criteria.  Key  performance  criteria  to  assess  include  user 
ERs,  efficiency,  workload,  motion  sickness,  and  general  operator  usability.  The  following 
paragraphs  discuss  three  future  research  projects  that  center  around  manipulations  of  camera 
perspective’s  FOV,  environmental  immersion,  and  multimodal  feedback. 

Previous  studies  on  manipulating  an  operator’s  FOV  have  reported  enhanced  performance  when 
a  wide  or  “natural”  FOV  was  present  compared  to  a  narrow  FOV  (Parasuraman  et  al.,  2005; 
Scribner  and  Gombash,  1998;  Smyth,  2002;  Smyth  et  al.,  2001;  Van  Erp  and  Padmos,  2003).  In 
light  of  these  findings,  it  is  hypothesized  that  performance  will  improve  (e.g.,  reduced  errors  and 
workload,  enhanced  efficiency  and  SA)  as  the  FOV  provided  by  the  robot-mounted  camera 
perspective  increases.  In  this  experiment,  FOV  would  be  manipulated  at  three  levels:  narrow, 
“natural  visual  range,”  and  wide.  In  addition,  we  would  measure  operators’  subjective  reports  of 
motion  sickness.  Scribner  and  Gombash  found  that  though  a  wider  FOV  increases  perfonnance, 
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it  also  is  associated  with  greater  incidents  of  motion  sickness.  Given  the  deleterious  effects  of 
motion  sickness  on  operators,  understanding  the  advantages  and  disadvantages  of  varying  levels 
of  FOV  is  needed. 

For  operators  to  effectively  navigate  a  terrain  with  a  teleoperated  robot,  a  robot-mounted 
camera’s  perspective  is  an  important  consideration.  Previous  research  suggests  that  an 
exocentric,  or  third  person,  perspective  compared  to  an  endocentric,  or  first  person,  perspective 
enhances  performance  and  subjective  reports  of  usability  (Olmos  et  ah,  2000;  Thomas  and 
Wickens,  2000).  Additionally,  Olmos  et  al.  found  that  a  split  screen  display,  which  enables  the 
operator  immediate  access  to  both  perspectives,  is  better  than  a  single  perspective.  Given  the 
aforementioned  findings,  it  is  hypothesized  that  an  exocentric  perspective,  compared  to  an 
endocentric  TALON-mounted  camera  perspective,  will  decrease  errors  and  increase  efficiency 
and  usability.  Additionally,  a  single  camera  screen  shot  will  be  less  effective  than  a  split-screen 
shot  of  both  perspectives.  Thus,  three  conditions  will  be  present:  an  exocentric  perspective  only, 
an  endocentric  perspective  only,  and  a  split-screen  endo-  and  exocentric  camera  perspective.  All 
conditions  will  employ  3-D  camera  displays  given  that  previous  studies  have  reported  the 
benefits  of  3-D  over  2-D  perspectives  due  to  the  increased  depth  cues  available,  realism,  and  SA 
(Drascic  and  Grodski,  1993;  Olmos  et  al.,  2000;  Scribner  and  Gombash,  1998). 

Last,  another  research  avenue  to  pursue  involves  comparing  a  single  visual  modality  condition  to 
a  multimodal  feedback  condition.  Park  and  Woldstad  (2000)  found  that  the  addition  of  another 
modality  cue  alleviated  differences  between  visual  perspective  conditions  that  were  present  when 
only  visual  modalities  were  compared.  Thus,  a  study  with  conditions  comparing  visual  feedback 
only  (camera  view),  visual  and  force  feedback  (vibrotactile  feedback  provided  as  operator 
control  TALON  using  joystick),  and  visual  and  audio  cuing  alerts  would  assess  whether  the 
simple  addition  of  another  sensory  cue  could  improve  operator  performance  without  a  significant 
change  to  the  robot’s  camera  perspective. 

5.7  Robot  Autonomy  and  Errors 

TALON  robot  operators  are  an  ideal  population  to  research  further  in  the  areas  of  LOA  and 
automated  aid  and  cuing  reliability.  Manipulating  the  amount  of  control  an  operator  has  over  a 
TALON  robot  or  the  amount  of  autonomy  the  robot  or  teams  of  robots  have  is  a  simple 
investigation  that  could  yield  information  directly  applicable  to  the  battlefield. 

Existing  research  on  LOA  supports  the  notion  that  increased  automation  generally  leads  to 
increases  in  perfonnance  (Wickens  et  al.,  2003);  however,  boundary  conditions  are  being 
identified.  For  example,  Chen  and  Joyner  (2009)  state  that  for  nonprimary  tasks  in  a 
multitasking  environment,  degradations  of  human  performance  are  often  observed.  Furthermore, 
Chen  and  Terrence  (2009)  provide  evidence  that  certain  individual  differences  such  as  attentional 
control  play  a  role  in  operator’s  interaction  with  automated  systems.  Unfortunately,  the  robots 
and  systems  used  in  this  research  vary  greatly,  and  most  results  have  been  task-  or  equipment- 
specific.  Additionally,  having  access  to  raw  data  (e.g.,  a  video  feed  rather  than  just  collision 
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sensors)  and  changes  in  workload  (e.g.,  controlling  one  vs.  two  or  more  robots)  affected 
performance  differently  in  the  various  equipment  and  scenarios  used  to  date  (Rovira  et  ah,  2007). 
Given  the  wide  array  of  perfonnance  effects  indicated  by  previous  research,  it  is  imperative  that 
these  factors  be  examined  as  they  directly  apply  to  TALON  operators  and  the  unique  demands  of 
the  environments  they  operate  within.  A  future  research  project  in  this  area  would  be  limited  by 
the  technologies  capable  of  being  integrated  into  the  TALON  systems.  If  Endsley  and  Kaber’s 
(1999)  10-level  LOA  taxonomy  were  used,  monitoring,  generating,  selecting,  and  implementing 
action  could  be  manipulated  in  a  search  and  rescue  scenario.  Based  on  previous  research,  it  is 
hypothesized  that  perfonnance  will  be  optimal  when  the  human  operator  generates  potential 
actions  and  selects  the  desired  action;  that  desired  action  is  then  automatically  implemented 
(Kaber  and  Endsley,  2003).  Other  potential  variables  of  interest  in  this  investigation  are  the 
operator’s  experience  level  and  the  number  of  robots  controlled.  Important  criteria  in  addition  to 
performance  are  SA,  subjective  workload,  and  general  usability. 

Previous  research  on  automation  aid  reliability  has  found  considerable  deleterious  effects  on 
performance  when  aids  and  cues  used  in  the  control  of  robots  are  inaccurate  or  not  dependable 
(Wickens  et  ah,  2003).  Like  LOA  research,  existing  studies  on  automation  aid  reliability  cover  a 
wide  gamut  of  scenarios  and  equipment,  for  which  results  have  been  mixed.  Overall,  automation 
aids  prone  to  false  alanns  appear  to  have  the  most  significant  negative  impact  on  perfonnance 
across  various  tasks  (Dixon  and  Wickens,  2006).  Imperfect  automation  can  also  affect 
performance  by  incorrectly  allocating  attention  to  a  noncritical  target  resulting  in  a  true  target  or 
an  enemy  going  unnoticed.  The  effect  of  automated  aid  reliability  on  TALON  operator 
performance  should  be  investigated  to  discover  whether  the  same  problems  arise  when  aids  are 
imperfect.  A  simple  navigating  and  targeting  scenario  would  be  used  with  an  automated  target 
detection  aid  to  draw  the  operators’  attention  to  potential  targets.  The  reliability  of  the  aid  would 
be  manipulated  as  well  as  the  proportion  of  false  alarms  vs.  misses.  Based  on  previous  research, 
it  is  hypothesized  that  more  targets  will  be  identified  under  the  more  reliable  aid  and  that 
performance  will  be  the  worst  in  the  high  false  alarm  condition.  Recent  research  is  identifying 
situations  in  which  individual  operator  differences  interact  with  false-alarm-prone  and  miss- 
prone  automated  differences.  See  Chen  and  Terrance  (2009)  for  these  and  other  findings. 
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RDRL  HRM  AV  S  MIDDLEBROOKS 
91012  STATION  AVE  RM  348 
FORT  HOOD  TX  76544-5073 

1  ARMY  RSCH  LABORATORY  -  HRED 
RDRL  HRM  CN  R  SPENCER 
DCSFDI  HF 

HQUSASOC  BLDG  E2929 
FORT  BRAGG  NC  28310-5000 
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NO.  OF 

COPIES  ORGANIZATION 


1  ARMY  RSCH  LABORATORY  -  HRED 
RDRL  HRM  DW  E  REDDEN 
BLDG  4  CL  60 

FORT  BENNING  GA  31905-5400 

1  ARMY  G1 
(CD  DAPE  MR  B  KNAPP 
only)  300  ARMY  PENTAGON  RM  2C489 
WASHINGTON  DC  20310-0300 

1  ARMY  RSCH  LABORATORY  -  HRED 
RDRL  HRM  D  T  DAVIS 
BLDG  5400  RM  C242 
REDSTONE  ARSENAL  AL  35898-7290 


ABERDEEN  PROVING  GROUND 

5  DIR  USARL 
RDRL  CIM  G 
S  FOPPIANO 
RDRL  HR 
T  LETOWSKI 
RDRL  HRM  B 
J  LOCKETT 
RDRL  HRS 
L  ALLENDER 
RDRL  HRS  D 
B  AMREIN 
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